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The experimental calorimetric two-state criterion requires the van't Hoff enthalpy AH v n around 
the folding/unfolding transition midpoint to be equal or very close to the calorimetric enthalpy 
A-ffcaj of the entire transition. We use an analytical model with experimental parameters from 
chymotrypsin inhibitor 2 to elucidate the relationship among several different van't Hoff enthalpies 
used in calorimetric analyses. Under reasonable assumptions, the implications of these AH v h's being 
approximately equal to AH ca \ are equivalent: Enthalpic variations among denatured conformations 
in real proteins are much narrower than some previous lattice-model estimates, suggesting that the 
energy landscape theory "folding to glass transition temperature ratio" Tf/T g may exceed 6.0 for real 
calorimetrically two-state proteins. Several popular three-dimensional lattice protein models, with 
different numbers of residue types in their alphabets, are found to fall short of the high experimental 
standard for being calorimetrically two-state. Some models postulate a multiple-conformation native 
state with substantial pre-denaturational energetic fluctuations well below the unfolding transition 
temperature and/or predict a significant post-denaturational continuous conformational expansion 
of the denatured ensemble at temperatures well above the transition point. These scenarios either 
disagree with experiments on protein size and dynamics, or are inconsistent with conventional inter- 
pretation of calorimetric data. However, when empirical linear baseline subtractions are employed, 
the resulting AH v n/ 'AH ca \'s for some models can be increased to values closer to unity; and baseline 
subtractions are found to correspond roughly to an operational definition of native-state conforma- 
tional diversity. These results necessitate a re-assessment of theoretical models and experimental 
interpretations. 
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Introduction 

In recent years, protein folding has been investigated 
extensively by statistical mechanical modeling (see re- 
views in Refs. 1-14, Refs. 15-23, and references therein). 
The relevance of these models to the basic understanding 
of microscopic energetics is premised on the tenet that 
macroscopic properties of a system are consequences of 
the properties of its microscopic constituent parts. It 
follows that insight and rationalization can be gained by 
constructing models and testing whether the presumed 
microscopic interactions are effective in reproducing 
experimental macroscopic behaviors. 24 High-resolution 
force-field potentials have been used to study protein 
folding 25 and unfolding. 26-28 Obviously, atomistic mod- 
els are indispensable for structural details. But at present 
it is not computationally feasible to use them to model 
thermodynamics and kinetics at millisecond or longer 
time scales. Also, it remains an open question whether 
empirical force fields would ultimately be adequate for 
predicting dynamics over long simulations. 29 Currently, 
a significant fraction of thermodynamics and kinetics 
data of proteins can only be addressed by complemen- 
tary approaches, mainly via polymer models with highly 
simplified representations of the geometry and interac- 
tions of the polypeptide chain. 1 " 4,15,30 Aside from their 



computational tractability, it is hoped that these sim- 
plified models may lead to the development of novel, 
(as- yet- undiscovered 31 ) concepts. Such "mesoscopic" 
organizing principles 31 may be needed to bridge our un- 
derstanding over gaps of many orders of magnitude in 
time and length scales separating the fundamental con- 
stituent atomic processes and the global features of a 
bio-macromolecule. 

Simple self-contained polymer models can be used 
to explore microscopic energetics of proteins. 

How do simple polymer protein models contribute 
to our physical understanding of proteins? Typically, 
the ingredients of such a model are (i) a conforma- 
tional space that accounts for chain connectivity and ex- 
cluded volume, and is sufficiently simple to be enumer- 
ated exhaustively 4,32 or sampled extensively, 3,7,11 and (ii) 
a set of rules (a potential function) that describes the 
"microscopic" interactions among the constituent parts 
of the chain. The most important feature of such a 
model is the conceptual clarity it offers because it is self- 
contained. This means that all properties and predic- 
tions of the model are derived solely from the postulated 
elementary microscopic ingredients. In particular, con- 
formational ensembles are determined by applying the 
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model potential function (ii) to ascertain the energetic 
favorability of every conformation in the model confor- 
mational space (i). Most recent lattice protein models 
belong to this category. However, some protein mod- 
els are not self-contained in this sense. In some ther- 
modynamic treatments 33-36 , for example, the unfolded 
or denatured state of a protein is postulated to contain 
only random-coil-likc conformations, but with no specifi- 
cation as to what microscopic interactions are responsible 
for such a remarkable property in discriminating against 
compact nonnative conformations. (See discussions in 
Ref. 23.) As such, non-self-contained models involve ei- 
ther unspecified or unjustified mechanisms that are not 
explicitly considered as parts of their microscopic poten- 
tial functions. Therefore, their explanatory power is lim- 
ited because they cannot make a full logical connection 
between the macroscopic properties they predict and the 
microscopic interactions they explicitly consider, though 
they can provide important insight and be very useful in 
other respects. 

Self-contained simple polymer models of proteins help 
frame our discourse in terms of basic physical interac- 
tions. They sharpen our focus on whether certain global 
properties can or cannot arise from the microscopic inter- 
actions presumed by a model. In these models, however, 
the necessity to simplify implies that one has to rely 
to a degree on intuitive judgement in the design of ap- 
propriate model representations to capture polypeptide 
properties. In principle, many simple models can give 
similar results. A successful predictions can therefore 
be fortuitous. It follows that the ability to reproduce 
a protein property is necessary but not sufficient for 
the validity of the presumed microscopic features of a 
model. On the other hand, if properties of a model are 
in disagreement with experimental data, it is a clear 
indication of deficiencies. Since simple models appear 
to enjoy a high degree of latitude in their design, it 
might be expected that reproducing general, "generic" 3 
properties of proteins would be straightforward. This 
is not the case. To the contrary, using simple models 
with physically plausible interactions to reproduce sev- 
eral thermodynamic 23,38 and kinetic 19,39 properties of 
proteins has been shown to be not trivial and requires 
in-depth analyses. This may be a blessing in disguise, 
because it means that a lot can be learnt about micro- 
scopic protein energetics from generic protein properties 
by using the latter as restrictive experimental constraints 
on models, to provide insight into what form of micro- 
scopic interactions are more likely to be proteinlike. 

The calorimetric criterion for thermodynamic 
two-state cooperativity requires a narrow 
denatured-state enthalpy distribution. 

One generic protein property that apparently has 
not been fully appreciated by modelers is the calori- 



metric two-state behavior of many small single-domain 
proteins, 40,41 which requires that the van't Hoff enthalpy 
Ai7 V H around the folding/unfolding transition midpoint 
to be equal or very close to the calorimetric enthalpy 
AH ca x of the entire transition. Thermodynamic proper- 
ties of several simple polymer models have recently been 
compared with this experimental criterion for two-state 
cooperativity. 22,23 One of us 23 argued that, under rea- 
sonable assumptions, the calorimetric two-state condi- 
tion requires the average enthalpy difference between the 
denatured and native ensembles around the heat denatu- 
ration midpoint not to further increase appreciably as the 
temperature is raised to complete the unfolding process. 
From analyses of analytic as well as two-dimensional lat- 
tice models, this is found to imply that the enthalpy dis- 
tribution among the denatured ensemble of conforma- 
tions has to be narrow in comparison with the average 
enthalpy difference between the native state and the de- 
natured state. 23 In the present study, we provide further 
support for this view by determining systematically the 
effects of using several slightly different common defini- 
tions of van't Hoff enthalpy for the calorimetric two-state 
criterion. 

A number of two-dimensional lattice protein models 
have been evaluated against the calorimetric criterion. 23 
Interestingly and unexpectedly, both a Go 15,19 and a 
Go-like HP+ (Ref. 19) model are found to be far away 
from being calorimctrically two-state. Apparently, in- 
sofar as the underlying chain model is highly flexible, 
even for these models with native-specific pairwise ad- 
ditive contact interactions (these interaction schemes 
are sometimes referred to as being "nearly maximally 
unfrustrated" 42,43 ), the denatured enthalpy distributions 
in these two-dimensional models are still too board to 
satisfy the calorimetric two-state standard. Based on 
these results, it has been suggested that a cooperative 
interplay between local and nonlocal interactions in pro- 
teins may be necessary to give rise to calorimctrically 
two-state behaviors. 23 In the present work, we evaluate 
six three-dimensional lattice protein models. These in- 
clude two- 44 and three-letter 45 models, a Go model, 46 
a "solvation" model 47 and 20-letter models with 48 and 
without 49 sidechains. Their thermodynamics are checked 
against the calorimetric criterion. We also evaluate the 
physical pictures of native and denatured states offered 
by some of these models in light of other experimental 
measurements on protein folding/denaturation transi- 
tions. 

Results and Discussion 

Overview of an analytical treatment. 

To provide a basic theoretical underpinning, we first 
re-examine several definitions of van't Hoff enthalpy 
(Aifyn's) in the protein folding literature, and the con- 
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sequences of using different Ainu's in the calorimetric 
two-state criterion AH V ^/ AH ca \ f. The main result 
of this section, to be demonstrated below, is that under 
reasonable, minimal assumptions regarding protein con- 
formational properties, calorimetric two-state criteria us- 
ing several commonly employed AH v h 's imply essentially 
equivalent requirements on a protein's density of states. 23 
We approach this by comparing the AH v h/ AH ca i values 
using different AH v n 's computed for a series of analytical 
models with a wide range of thermodynamic cooperativ- 
ities. 

We begin by recalling a few basic relations. As dis- 
cussed in detail previously, 23 the main thermodynamic 
quantities of interest for the issues at hand are the ex- 
cess enthalpy and heat capacity. Experimentally, raw 
calorimetric data consists of heat capacity scans over a 
range of temperatures, from which an excess enthalpy 



(AH(T)} = (H(T)} - 



(1) 



as a function of absolute temperature T can be ob- 
tained by standard baseline subtraction and numeri- 
cal integration techniques. 41 Here H is the enthalpy 
of the entire "excess" system, 23 ' 41 H^ is the enthalpy 
of the native state, and (. . .) denotes Boltzmann aver- 
aging. In general, the native enthalpy H^ should be 
replaced by a Boltzmann average (H-^(T)) over con- 
formational variations in the native state. (See dis- 
cussions below on 20-letter models with and without 
sidcchains.) Here we adopt as a working assumption 
that the native state become effectively a single confor- 
mation with a single temperature-independent enthalpy 
value after proper baseline subtractions. 23 The calorimet- 
ric enthalpy AH ca \ = (Aif(Ti)) at a sufficiently high 
temperature T\ at which the heat denaturation process 
is completed (Ti may be formally taken to be oo in model 
considerations). 23 The expression for the excess heat ca- 
pacity function 
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d(AH(T)} _ (H 2 (T)) - (H{T)Y< 
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k R T 2 



(2) 



follows from standard statistical mechanics, 23 where k B 
is Boltzmann's constant. Equation (2) corresponds to 
ACp in the calorimetric literature (ACp jtr in Ref. 41). 
We drop the symbol A here for the excess heat capacity 
as in Ref. 23 to simplify notation. 

Several different definitions of A£f V H have been put 
forth in the protein calorimetric literature. 22,23 ' 40 ' 41 ' 50,51 
In general, their values can be very different. This raises 
the possibility of complications in comparison between 
theory and experiment. In Ref. 23, one of us noted that 



while different A_ff v fj's may be different when the tran- 
sition is far from being calorimetrically two state — i.e., 
two-state as defined by the condition AH v n/AH ca i w 1 
using any one of the AiJ v u's, a semi-quantitative ar- 
gument can infer that for proteins which can be fully 
denatured by heat, AH v h w AH ca i for one AH v n would 
imply that the same approximate equality also holds for 
other Ai? V H ! s. Here we substantiate this inference by 
quantitatively analyzing a class of models for protein 
densities of states. 

Definitions of protein folding van't Hoff en- 
thalpies. 

In general, a temperature-dependent van't Hoff en- 
thalpy is given by 



A-Hvh(T) = k B T 2 ——-^— — k H T 2 
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d9 



9(1 - 9) dT 
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where K cS is the apparent 52,53 or effceth 



,22,51 



equilib- 



rium constant of the system, and 9 — 9(T) is a two-state 
progress parameter for tracking the transition process; 
K e — 9/(1 — 9) and 9 takes values from zero (at low 
temperatures in the present cases) to unity (at high tem- 
peratures). For heat denaturation of proteins, 9 = 
and 9=1 correspond respectively to the completely na- 
tive (fully folded) and fully denatured (unfolded) states. 1 
Therefore, at the midpoint temperature T m id po int of the 
parameter 9, i.e., when 9(T = T m id P oint) = 1/2, 



AH, 



vH 



Ah T 1 ^ 

™B - 1 midpoint ^ 
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T=T, 



midpoint 



As in Ref. 23 and is customary in the calorimetric lit- 
erature, AH v h is understood to be evaluated at a cer- 
tain midpoint temperature when its T dependence is not 
shown explicity. 

It follows that different choices of 9 would result in dif- 
ferent van't Hoff enthalpies and different midpoint tem- 
peratures. The theoretical population-based AH v n in 
Ref. 23 corresponds to 9 = [D] — the denatured fraction 
of the total population, and a midpoint temperature T t i 2 
at which one half of the chain population is denatured. 
Here we use Ko to denote the AH v n/ AH ca \ ratio of this 
population-based van't Hoff enthalpy to the calorimetric 
enthalpy. Experimentally, the heat absorbed by the sys- 
tem is often used to quantitate the degree of progress of 
the transition process under a two-state assumption by 
setting 9 = (AH) / AH ca \, with a corresponding midpoint 
temperature T4 at which one half of the total calorimet- 
ric heat (AH ca \/2) has been absorbed (Ref. 51). This 



1 6 is equivalent to Lumry et al.'s 52 ([(a)(T) - (a) A (T)] / 
[(q)b(T') — (q)a(T 1 )], where a is an observable [their Eq. (4)]. 
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leads to a van't Hoff enthalpy which is proportional to 
the excess specific heat at Tj, (see below). 

On the other hand, a "square-root" van't Hoff enthalpy 
formula has also been used by Privalov and coworkers 40 ' 50 
to analyze experimental data. It takes the form 



AH V H — 2T m idpoint y fcflCp (T'midpoint) . (5) 

Apparently this corresponds to setting 9(T) = 
(AH(T)) /AiJvH, and assuming that it is a valid progress 
parameter. Equation (5) is used in conjunction with ei- 
ther the peak temperature T max of Cp (Rcf. 40) or T d 
(Ref. 50) as midpoint temperatures at which 9 = 1/2 is 
presumably a good approximation (see also Ref. 23). To 
ascertain the effects of different AH v n 's on the calorimct- 
ric criterion, we compare the population-based kq defined 
above with the following possible van't Hoff to calorimet- 
ric enthalpy ratios using different midpoint temperatures 
for the square-root formula 23 ' 40 ' 50 : 

«i - 2T 1/2 ^k B Cp(T 1/2 )/AH c&1 , 

K 2 = 2T max -\/fc i 3Cp(r max )/ AiJ ca i , (6) 
n 3 - 2T d ^k B C P (T d )/AH cal . 



Finally, it is not difficult to see that the van't Hoff to 
calorimetric enthalpy ratio for 9 = (AH) / AH ca \ above 
is given 51 by (k^) 2 . So we also consider («i) 2 (K2) 2 and 
(K3) 2 as possible van't Hoff to calorimetric enthalpy ra- 
tios. The definitions and usage of these quantities are 
summarized in Table I. 

Despite their different definitions, several van't 
Hoff enthalpies give essentially the same calori- 
metric two-state criterion. 

We now compute these different van't Hoff to calori- 
metric enthalpy ratios for a class of models that intu- 
itively capture the most basic features of protein ener- 
getics, which are an essentially unique native state as 
the lowest (ground) enthalpic state of the system, and 
a huge number of unfolded (denatured) conformations 
with higher enthalpies. For this purpose, we use simple 
random-energy-like models with Gaussian enthalpy dis- 
tributions for the denatured states. Their (continuum) 
densities of states g(H ) are given by 23 

g(H) = 6(H) + 0(H) c -{h-h d )/(2v h *) ^ (?) 

where 5(H) is the Dirac delta function, the native en- 
thalpy _ff N = 0, the step function 9(H) = 1 for H > 0, 
and 9(H) = for H < 0. g D (3> 1) and Hd are respec- 
tively the total number and average enthalpy of the dena- 
tured conformations, whereas the standard deviation an 



specifies the width of the enthalpy distribution among 
them (Figure 1); see Ref. 23 for details. The correspond- 
ing partition function Q = Q N + Qn 7 whose native part 
Qn = 1 is the statistical weight of the native state, and 
the denatured part 

Q D ( T ) = -J^- f dH c -(H-H )/(2a H >) c -H/(k B T) 

(8) 

Hence [D] = Qb/Q- We perform numerical integrations 
over H to obtain thermodynamic averages such as na- 
tive and denatured populations [Eq. (8)], average en- 
thalpy, and heat capacity as functions of temperature, 
from which the midpoint temperatures and k's defined 
above are determined. To simplify these calculations, 
rather than integrating through H — > +00, we use a high 
H cutoff that set g(H) = for H > 4H D in Eq. (7). The 
special case of a strictly two-state model (corresponding 
to on — > 0) is discussed in the Appendix. 

For the class of models we study, we fix both the 
average enthalpy (Hjy) and entropy (parametrized by 
g D ) of the denatured state. This leads 23 to an essen- 
tially constant AiJ ca i = Hp. Only the denatured en- 
thalpy distribution width <jh is varied. Here we use 
H-£,/k B = 3 x 10 4 (equivalent to Hd = 60.0 kcal mol -1 ), 
and g D = 5.68 x 10 38 (Figure 1). These values are 
the same as those used in our previous study. 23 They 
correspond approximately to the experimental data ob- 
tained by Jackson et al. 54 for the Ile-^Val76 mutant of 
chymotrypsin inhibitor 2 (CI2; see Fig. 3 of Ref. 54). 
Hence we believe that realistic protein energetics can be 
explored using this class of models. 

Figure 2 shows how the model midpoint temperatures 
and thermodynamic cooperativity vary with uh- The 
calorimetric two-state criterion allows for some tolerance. 
This is because even small single-domain proteins devi- 
ate slightly from a strictly two-state description, 33 with 
AH v n/ AH ca x slightly less than unity. So we do not have 
to require model AH v n/ AH ca \ to be exactly equal to 
unity. Nonetheless, it is also clear that the experimen- 
tal observation of AH V ^/ AH C& \ w 1 imposes severe con- 
straints on enthalpy distributions in proteins. Experi- 
mentally, AH v n/ AH ca \ = 0.96 is reported by Fersht and 
coworkers 5 for CI2, other calorimetric two-state proteins 
have similar AH v n/ Aif ca i's (Ref. 33.) For the present 
models, if the AH v n/ AiJ ca i's are to be > 0.96, it re- 
quires an < 775 (Figure 2b, in units of k B ). This means 
a very narrow denatured enthalpy distribution, as the 
standard deviation an has to be less than or equal to 
775/(3 x 10 4 ) w 1/40 of the average enthalpic separa- 
tion between the native and the denatured states, AiJ ca i 
(see Figure 1). Within this class of models, thermody- 
namic stability correlates with cooperativity (Figure 2a). 
For AHy-n/ AH ca \ w 1, the folding transition temperature 
s=s 65°C corresponds to that observed experimentally. 54 
However, stability decreases as the denatured enthalpy 
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distribution widens. The transition temperature falls be- 
low 0°C when cjh exceeds w 1/17 of AH ca \. 

Figure 2a shows the relation among the three midpoint 
temperatures. They are essentially identical when the 
model protein is highly cooperative (small er#). The 
difference between Td and the other two temperatures 
increases as cooperativity diminishes. This is because 
when the enthalpy distribution in the denatured state is 
wide (large (Th), there are more low- lying nonnative en- 
thalpies, which tend to lower the overall average enthalpy. 
As a result, more than half of the chain population has 
to be denatured (hence a higher temperature than 7\/ 2 
is required) to achieve an average enthalpy of A7J ca i/2 
than when the denatured enthalpy distribution is nar- 
rower (smaller cry). This accounts for the differences 
among the three k's [Eq. (6)] and (k) 2 's in Figures 2c and 
d. For real two-state proteins, Td can differ from T max by 
<~ 1°C (Ref. 50). On the other hand, T max is practically 
identical to Ti/ 2 for a much wider range of cooperativ- 
ity for these models. It appears that T max T X j 2 is a 
consequence of g D » 1. Model proteins with less confor- 
mational freedom 23 than those considered in Figures 1 
and 2 have non-negligible differences between T x / 2 and 
Tmax (see Appendix and discussions on three-dimensional 
lattice models below). 

Figures 2c and d compare the population-based 23 ko = 
Aifvn/AiJcai with experimental formulas and their vari- 
ations. For this class of models, kq = n\ — n 2 holds 
almost exactly. Owing to the behavior of Td discussed 
above, k 3 deviates from the other three k's when the 
model is not cooperative, but all four k's are practically 
identical if their values are > 0.9. When the enthalpy 
ratios k's are less than one, naturally the square-root 
(k) formulas Eq. (6) gives larger van't Hoff to calori- 
metric enthalpy ratios than the (k) 2 formulas. The lat- 
ter equate A# v h with 4fc B T^ idpoint Cp(T m i dp oint)/A# ca i 
(Ref. 22,40,41,51). However, when any one of the 
AiJvH/Ai/cai's equals unity, it implies that all other 
Aif V H/AiJ ca i's also equal unity. 

These observations suggest that the following general 
conclusion should be valid: Insofar as a protein can be 
fully denatured by heat 23 (as these models are), which 
implies that it has a sufficiently high denatured-state en- 
tropy relative to the native state (which should be satis- 
fied by all proteins because of their polymeric nature), all 
of the AiJvH/ATJcai's considered in this paper provide es- 
sentially the same calorimetric two-state conditions, and 



thus have the same requirement on the density of states 
of the proteins. 

Recently, Zhou ct al. 22 used a homopolymer tetramer 
model to show that it is possible to have (K3) 2 > 1, and 
that the deviation from the calorimetric criterion is not 
simply related to the population with intermediate en- 
thalpies. Remarkably, the thermodynamic properties of 
their continuum tetramer model are very similar 23 to that 
of a lattice tetramer toy model introduced previously by 
Dill et al. 4 Since the ground-state populations of these 
small systems are substantial 23 even under athermal 
conditions (T = 00), they cannot be fully "denatured." 
Hence this interesting and important observation of Zhou 
et al. is not inconsistent with our general conclusion re- 
garding proteins. The present study does not address the 
application of van't Hoff analysis to chemical reactions in 
solutions 55 because of fundamental differences between 
chemical reactions and the conformational transition of 
polymeric systems treated here. 

Calorimetric two-state cooperativity implies a 
very low "glass transition" temperature for the 
folding of two-state proteins. 

The above thermodynamic results are relevant to fold- 
ing kinetics, especially landscape theories that utilize 
the spin-glass approach put forth in the seminal work 
of Bryngelson and Wolynes. 56 ' 57 It has been argued, and 
has been generally accepted, that in order for a protein to 
fold in a kinctically efficient manner, its folding transition 
temperature Tf must be significantly greater than a glass 
temperature T g that characterizes the onset of sluggish 
folding kinetics as the temperature is lowered 58 (reviewed 
in Refs. 3, 4). Subsequently, based on a series of insight- 
ful studies by Onuchic, Wolynes and coworkers, 45 ' 59 ' 60 
it has been further argued that a "law of corresponding 
states" 6 ' 59 ' 60 can be used to predict the ratio Tf/T g for 
real proteins from simulations of a 27mcr 3-lctter code 
(3LC) model protein configured on three-dimensional cu- 
bic lattices 45 ' 59 (see discussion below). This approach 
provided an estimate of Tf/T g — 1.6 for small a-helical 
proteins. 6 ' 42 ' 43 ' 59 More recently, Onuchic et al. 9 consid- 
ered the thermodynamics of a Gaussian random energy 
model similar to the one employed here and derived the 
relation T{/T g — (H^j an)\Jlj In g D (in the present no- 
tation). 2 



2 Solvent- mediated (effective) intraprotein interactions can 
have enthalpic as well as entropic contributions. However, 
heat-induced conformational changes would be impossible if 
these interactions do not contain enthalpic parts. The inter- 
action energy E was taken to be purely enthalpic in Onuchic 
et al.'s random-energy treatment of temperature dependences 
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The estimate Tf/T g ss 1.6 was based on kinetic simula- 
tions. As such, it may be viewed as a lower bound for a 
protein to satisfy a certain requirement for foldability. A 
previous random-encrgy-model analysis already suggests 
that a higher thermodynamic Tf/T s ratio may be needed 
to satisfy the additional constraint imposed by calorimet- 
ric two-state cooperativity. 23 Figure 2b shows calorimct- 
ric cooperativity as a function of Tf/T g (the horizonal 
axis is marked by the inverse of this ratio, T e /Tf, by ap- 
plying Eq. (12) of Onuchic et al. ). Using realistic pro- 
tein parameters, 23,54 Figure 2b shows that in the context 
of the present random-energy model analysis, for a pro- 
tein's AH v b_/ Ai? ca i > 0.96 (Ref. 54), it is necessary for 
T f /T g > 5.8; AT/ vH /AT/ cal > 0.99 implies T f /T g > 10.0; 
and Tf/T g w 1.6 would imply that the protein is not 
calorimetrically two-state, with AH v n/AH ca x < 0.2. 

Therefore, combining our results with Onuchic et al.'s 
analysis 9 leads us to the conclusion that for proteins that 
are calorimetrically two-state, Tf/T g should be higher 
than the earlier estimate of 1.6, and may well exceed 
6.0. In that case, even for an hypothetical highly sta- 
ble two-state protein with T f w 100°C (373. 15K), T g is 
still very low, at sa 62K. This folding glass transition 
temperature is a theoretical construct for quantitating 
a "rugged" landscape's impediment to the kinetics of 
folding from the denatured to the native state. The 
physics it describes is different from the "glass tran- 
sition" of native proteins observed experimentally at 
w 200K (see, for example, Ref. 61), though it has been 
suggested 59 that the two phenomena might be related. 
The present calorimctric estimate of T g 62K is much 
lower than temperatures at which folding actually takes 
place. While the idealized enthalpy distribution of a 
random-energy model without explicit chain represen- 
tation might have underestimated the chance of having 
low-enthalpy kinetic traps, such traps should neverthe- 
less be improbable given this extremely low estimate for 
T g . Therefore, our results suggest that in general kinetic 
traps should have at most minimal effects on the folding 
of real calorimetrically two-state proteins of sizes compa- 
rable to CI2. 19,37,42,43 This view is apparently supported 
by recent folding experiments on proteins with no kinetic 
intermediates. 62-67 In this perspective, it would be par- 
ticularly revealing to elucidate the relationship between 
multi-phasic kinetics and calorimetric cooperativity for 
real proteins that do fold with kinetic intermediates (see, 
for example, Refs. 68-70, and theoretical perspectives in 
Refs. 3-8, and 11). 

Lattice protein models: Why compare them 
against the calorimetric two-state criterion? 



We now turn to protein models with explicit chain rep- 
resentations. Recent years have seen sustained efforts in 
using highly simplified lattice models to understand gen- 
eral properties of proteins. Lattice protein models were 
pioneered by Go and coworkers. 15 Go models assume that 
only those contact interactions that occur in the native 
conformation can be favorable, whereas all nonnative in- 
teractions are neutral. This approach to modeling may 
be characterized as teleological, because the native con- 
formation is hardwired explicitly into the model poten- 
tial function. A lot of useful insight has been gained by 
this methodology. But it is important to realize that 
a Go model leaves open the question as to what physi- 
cal interactions can conspire to produce the remarkable 
molecular recognition effect it has assumed. 

An essential difference between Go models and mod- 
els introduced in the past decade — beginning with the 
simplest 2-letter HP potential, 30 ' 32 is that many of the 
more recent models have adopted microscopic interaction 
schemes that are independent of a particular native con- 
formation. Therefore, these models offer the possibility 
to better explore the physico-chemical bases of protein 
folding. While much have been learnt (see, for example, 
Refs. 1-9, 11, 12), the goal of using these models to elu- 
cidate general protein properties has not been fully real- 
ized. One of the most generic thermodynamic properties 
of many small single-domain proteins is their calorimetric 
two-state cooperativity. However, no three-dimensional 
lattice model has been evaluated against the calorimetric 
two-state criterion. We do so here for six representa- 
tive models. This was motivated by a previous study of 
two-dimensional models, 23 which has led us to suspect 
that to design a physically plausible three-dimensional 
interaction scheme to reproduce calorimetric two-state 
behaviors might be non-trivial, and that other deficien- 
cies of lattice models in describing real two-state protein 
properties 37 might be intimately related to their lack of 
calorimetric two-state cooperativity. 

We take this as the first step in an endeavor to build 
simple tractable self-contained models to capture more 
proteinlikc features. It is hoped that once models are re- 
quired to better conform to the calorimetric two-state cri- 
terion, mechanisms for other two-state proteinlikc prop- 
erties would either be apparent or become more easily 
decipherable. From this vantage point, the substantial 
amount of lattice model data accumulated over the years 
constitutes a valuable repository of information. By ap- 
plying appropriate experimental tests on these models 
for their similarities with and their differences from real 
protein behaviors, one would gain new insight into what 



that leads to Eq. (12) in Ref. 9. 
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novel energetic ingredients might be necessary for build- 
ing better models. 

We consider six models, 44-49 as shown in Figure 3. We 
choose to analyze these models in depth because they are 
representative and instructive, covering a varieties of ap- 
proaches and assumptions employed in recent efforts to 
model proteins as chains configured on three-dimensional 
simple cubic lattices. Some models in Figure 3 have 
been studied extensively and contributed significantly 
to the advances in theoretical understanding. All these 
models are based upon additive pairwise nearest-lattice- 
ncighbor contact energies. As described in the original 
references, 44-49 the contact energies are all assumed to 
be temperature independent. We therefore refer to these 
energies as enthalpies, as in Ref. 23, to conform to the 
terminology in the experimental calorimetric literature. 

Lattice simulation methods. 

Using the model potential functions described in their 
respective original studies, 44-49 thermodynamic quan- 
tities of these models were computed using standard 
Metropolis Monte Carlo (MC) histogram techniques. 71 ' 72 
The chain move set we used consists of end, cor- 
ner, and crankshaft moves, as described by Socci and 
Onuchic, 44 with additional sidechain moves for the 20- 
letter sidechain model (Figure 3f). 48 Each histogram was 
computed using a total of 4.5 x 10 8 attempted moves, 
whereby data was collected after allowing for an ini- 
tial equilibrating run of 5 x 10 7 attempted moves. Ev- 
ery attempted move is counted as elapsed MC time in 
computing Boltzmann averages, whether it is accepted 
or rejected; and if rejected, regardless of whether it is 
caused by excluded volume violation or by the stochas- 
tic Metropolis algorithm for an attempted move that in- 
volves a finite increase in energy (enthalpy). The sim- 
ulation temperatures are given in the captions for Fig- 
ures 4-9. In one case (the Go model in Figure 7), we also 
performed several independent MC simulations at differ- 
ent temperatures to confirm the MC histogram results. 



Our sampling of the densities of states should be ade- 
quate since we obtained essentially the same midpoint 
temperatures as the original studies for all six models. 3 

Thermodynamic functions relevant to calorimetric 
considerations are plotted in Figures 4-9. In these fig- 
ures, 7\/ 2 is the temperature at which the chain pop- 
ulation [N] in the single lowest-enthalpy conformation 
equals 1/2. This single-lattice-conformation definition 
of the model native state and the corresponding identi- 
fication of Ti/2 with the folding transition temperature 
coincide with the original formulations in four of the 
models. 44-47 However, a multiple-lattice-conformation 
native state containing other conformations in addition 
to the lowest-enthalpy conformation was advocated by 
the authors of the two 20-letter models. 48 ' 49 Hence, ac- 
cording to their definitions, the "native" populations in 
their models 48 ' 49 are larger than [N] in Figures 6 and 9. 
We will give more detailed consideration to the issue of 
native state definition below. 

Evaluating lattice protein models against the 
calorimetric two-state criterion. 

A First Step: Modeling Heat Capacity Functions With 
No Baseline Subtractions 

We first apply the model heat capacity and enthalpy 
functions in Figures 4-9 directly to the relation 23 ko = 
(AH(Ti/ 2 ))d/ AH ca i and Eq. (6) above to compute vari- 
ous AiJvjj/AiJcai ratios in Table II. This is equivalent 
to assuming that for each model (as for the random- 
energy models above), the entire model Cp function is 
directly comparable to the "transition" part of an ex- 
perimental excess heat capacity function, the analyses 
of which has led to the calorimetric two-state condition 
AiJ V H/AiJ ca i w I for many small proteins. Experimen- 
tally, the transition part of the excess heat capacity is 
obtained by performing baseline subtractions on the raw 
data. 23 ' 41 This exercise we now undertake is a neces- 
sary and instructive starting point that involves minimal 



3 For the 20-letter model, the temperature at which the 
Boltzmann average (Q) of the number of native contact Q 
equals one half of the total number Qn of native contacts 
was reported to be 0.272 in Ref. 49 (note that this Q is dif- 
ferent from the symbol Q for partition function), whereas the 
present simulation gives 0.279. The discrepancy is not big. 
However, it is not clear whether the discrepancy merely re- 
flects numerical uncertainties or is it related to a possible sys- 
tematic deviation from the correct Boltzmann distribution in 
previous simulations in which attempted moves rejected by 
excluded volume violations were not counted as elapsed MC 
time (page 185 of Ref. 49, page 1617 of Ref. 73), as has been 
noted recently (Ref. 47). 
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assumption, 23 as it does not entail performing any base- 
line subtraction on model results. After a basic perspec- 
tive has been gained, we will discuss in a later section 
the feasibility and appropriateness of applying baseline 
subtractions to model specific heat functions. 

In addition to the Cp functions, the upper panels of 
Figures 4-9 also show the heat capacity contributions 
(Cp )d[D] from thermal transitions among nonnative (in 
these cases, non-ground-statc) conformations. 23 When 
a large fraction of Cp arises from transitions among 
nonnative conformations instead of transitions between 
native (N) and nonnative (D) conformations, signifi- 
cant deviations from calorimetic two-state behaviors by 
the Ko ~ 1 standard are expected 23 (Table II). This 
is because a large (Cp)d[D] contribution means that 
even after passing the dcnaturation transition midpoint 
(when [D]> 1/2), the average denatured enthalpy will 
continue to rise substantially when the temperature is 
further raised (see the lower panel of Figure 4, for ex- 
ample), as denatured chains are propelled to populate 
conformations at higher and higher enthalpies. Table II 
summarizes the six models' conformity to calorimetric 
two-state criteria based on different Aif v n's. Calorimet- 
ric cooperativities measured by common experimental 
AH v n/ AH ca i formulas such as (K2) 2 and (k 3 ) 2 (see Ta- 
ble I) can readily be calculated from Table II. 

None of the Models Tested Meets the Calorimetric Two- 
State Standard 

Table II shows that all six models tested by the present 
method do not meet the experimental calorimetric two- 
state standard. Among them, the Go model appears to 
be most cooperative, with ko = 0.54 and k 2 ~ k 3 = 0.87. 
If the common experimental formulas (K2) 2 (Ref. 41) 
and (k 3 ) 2 (Ref. 51) for van't Hoff to calorimetric en- 
thalpy ratio are used, this translates into AH v n/ A_ff ca j 

0.75 for this particular Go model. This is still low 
when compared with experimental values of ~ 0.96 
(Ref. 54) for calorimetrically two-state proteins. For 
five small compact globular proteins — ribonuclease A, 
lysozyme, a-chymotrypsin, cytochrome c, and metmyo- 
globin, Privalov 51 reported an average AH v n/AH ca \ = 
(k 3 ) 2 = 0.96 ±0.03. 

Different Calorimetric Criteria are Related to Definitions 
of the Native State — 20-Letter Models 

For the models tested, the AH v n/ AH ca \ values (k's) 
vary considerably depending on what definition of van't 
Hoff enthalpy is used (Table II). The variation is mildest 
for the 2- and 3-letter models, for which the population- 
based k is almost identical to one of the experimental 
square-root formulas, k 3 . And while k 2 's are different 
from k 3 's for these two models, they are only 27-38% 
larger than kq. For the other four models, the difference 



between kq and the experimental formulas K2 or k 3 is 
larger: k 3 is 1.6 - 1.8 times kq for the Go and modi- 
fied HP models, whereas k 3 is « 7 times bigger than Ko 
for the two 20-letter models. For the latter four models, 
however, K2 is virtually identical to k 3 . 

The differences among k's are often related to differ- 
ences in the midpoint temperatures used to define them. 
For the 2- and 3-letter models (Figures 4 and 5), the tem- 
perature T1/2 for the population-based k ( an d «i) are 
well within the peak region of the specific heat capacity 
function and quite close to the temperature T max for 
This accounts for the relative small differences among k , 
Ki, and «2 in these models. The difference between Kq 
and K2 is larger for the Go and modified HP models, but 
Ti / 2 still lies within the peak region of the Cp function 
and not that far away from T max (Figures 7 and 8). The 
difference between kq and K2 is much larger for the two 
20-letter models. In these constructs, T X j 2 is well outside 
the peak region of Cp (Ti/ 2 <C Tmax, see Figures 6 and 
9). On the other hand, T max ps Tj, for the Go model and 
the 20-letter model without sidechains (Figures 6 and 7), 
hence they have K2 ~ k 3 - 

The large temperature differences between Ti/ 2 and 
T max in Figures 6 and 9 highlight one peculiar feature 
of the two 20-letter models which is qualitatively differ- 
ent from the other four models. For both of them, the 
population [N] of the single ground-state conformation is 
below 10% at T max , whereas the Cp at Ti/ 2 (when [N] = 
1/2) is very low. This feature is intimately related to the 
rationale for adopting a multiple-lattice-conformation 
native state in these models. 48 ' 49 In physical terms, it 
means that Cp is dominated at low temperatures by 
transitions among the single ground-state conformation 
and other conformations with very low (close to ground- 
state) enthaplies, most of these conformations belong 
to these models' multiple-conformation native state as 
defined by their authors 48, (see below). When the tem- 
perature is raised, population in the single ground-state 
conformation continues to decrease as more of it is being 
transferred to other low-enthalpy conformations. There- 
fore, when the temperature reaches T ma x, contributions 
to the peak value of Cp are dominated by transitions 
between the group of low-enthalpy conformations as a 
whole with the large number of high-enthalpy confor- 
mations. By that time the population [N] in the single 
ground-state conformation has become quite insignifi- 
cant. This is the basic reason why K2 » % « K\ for 
these two models (Table II). 

Model Heat Capacity Functions can be Compared Di- 
rectly with Experiments — Go and 20-Letter Mainchain 
Models are More Cooperative 

By considering random-energy models, we have argued 
above that all common calorimetric criteria using differ- 
ent k's are essentially equivalent when AH v a/ AH ca \ rts 1 
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and the native state is represented by a single enthalpy 
value in an effective density of states that describes the 
transition part of an experimental excess heat capacity 
function after proper baseline subtractions. 23 The behav- 
ior of the two 20-letter models prompts us to ask a more 
general question: Which n computed from a model would 
be most relevant for comparing theory with experiment 
when AH v n/ AH ca \ deviates significantly from unity and 
the native state of the chain model may have multiple 
enthalpy levels? 

From an operational standpoint, among the 
Aif V H/Ai? ca i's considered, k 2 , k 3 , (k 2 ) 2 , and (K3) 2 are 
most directly related to experiments. This is because 
they can be determined by analyzing the model Cp 
function alone (which corresponds to an experimental 
calorimctric scan) without involving an a priori defi- 
nition of the "native state" (whereas such a definition 
is needed to determine T x ii for k and K\). It is also 
prudent to not commit prematurely to a general singlc- 
lattice-conformation definition of the native state. 

By this operational standard, the 20-letter model with- 
out sidechains is second most cooperative after the Go 
model, with k 2 w K3 = 0.66. On the other hand, the 
2-letter, 3-letter and modified HP models are far from 
being calorimetrically two-state by all standards consid- 
ered here: none of their k's exceeds 0.46; in fact they 
are often much lower (Table II). In these models, at any 
one of the transition midpoints, the average enthalpic 
difference (AH(T))r> between the denatured state and 
the single native conformation is low relative to AH C& \ 
(lower panels of Figures 4, 5, 9). 

2- and 3-Letter Models are Less Cooperative — "Vari- 
able Two- State" Does Not Equal "Calorimetrically Two- 
State" 

For the 2-letter model in Figures 3a and 4, a previous 
study has shown that its denatured enthalpy distribution 
is a broad shifting peak whose center position is moving 
continuously to higher values as temperature is increased 
(for example, the peak H w —64 at T = 1.26 whereas 
the peak H w -16 at T = 5.00, see Fig. 5 of Ref. 72, H is 
equivalent to their E). Therefore, this 2-letter example 
corresponds to the "variable two-state" case of Dill and 
Shortlc (Fig. IB of Ref. 74) with heat (increasing tem- 
perature) as the "denaturing agent." The observation 
here implies that the variable two-state scenario can differ 
substantially from a calorimetric two-state transition if it 
entails significant post-denaturational shifting of the en- 
thalpy distribution among the denatured conformations. 
The present calorimetric analysis agrees with previous 
assessments 75 that the 3-letter model is more coopera- 
tive (has larger k's, Table II) than the 2-letter model, 
though both are far from being calorimetrically two- 
state. We will consider the 3-letter model in more detail 
below. The modified HP model in Figures 3e and 8 was 



motivated by considerations of hydration effects. Its po- 
tential function is based on two residue types (H and P), 
with novel features 47 such that it effectively interpolates 
between the standard HP potential 30 ' 32 (when chain con- 
formations are open) and the "AB" potential 76 " 78 (when 
chain conformations are compact). In the AB potential, 
like residues attract and unlike residues repel. Repulsive 
interactions 19 ' 77 of the AB type facilitate sequence design 
and enhance kinetic foldability in this modified model 
relative to the standard HP model, 47 though it is insuffi- 
cient for calorimetric two-state cooperativity (Table II). 
It is interesting to note that the spatial organization 
of residues in the native conformation of this modified 
HP model (Figure 3e) is dictated mainly by the AB 
potential. Consequently, the two types of residues are 
segegratcd to opposite sides of the structure to minimize 
contact, rather than organizing into a hydrophobic (H) 
core surrounded by polar (P) residues as in typical HP 
ground-state conformations. 79 

Short 20-letter Sidechain Models are not Calorimetically 
Cooperative 

We have also calculated Klimov and Thirumalai's 48 co- 
operativity parameter O c by extending the MC histogram 
technique to compute the temperature dependence of 
their structural overlap function \ (Refs. 23, 48). The 
results are included in Table II. While f2 c is basically a 
measure of the sharpness of a transition and does not 
always correlate with the degree of conformity to calori- 
metric two-state cooperativity, 23 for these six models the 
rank ordering of the three most cooperative models by k 2 
coincides with their rank ordering by fl c . This suggests 
that £l c may correlate reasonably well with calorimctric 
cooperativity if the conformational entropies of the chain 
models in question are similar. The calorimetric cooper- 
ativity as measured by k 2 and K3 of the 15mer 20-letter 
sidechain model of Klimov and Thirumalai is low (Fig- 
ures 3f and 9, their "sequence A"), and is comparable to 
that of the 2-letter, 3-letter, and the modified HP model. 
Remarkably, by the fi c measure, it is by far the least 
cooperative among the six models. We have also com- 
pleted the same analysis for their other sidechain model, 
"sequence B." The results are similar (n 2 = 0.25, other 
data not shown). The low levels of calorimctric cooper- 
ativity in these sidechain models may be a consequence 
of the shortness of the chains, as it has been observed 
that models with sidechains on average have higher fi c 's 
than non-sidechain models with the same number of 
mainchain monomers. 48 Nonetheless, the present results 
mean that how sidechains may enhance thermodynamic 
cooperativity in longer chain models is a question that 
remains to be ascertained. 

The Enthalpy Distribution of Go Model is Trimodal 
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We now take a closer look, as an example, at how the 
underlying enthalpy distribution of the Go model (Fig- 
ures 3d, 7) gives rise to its relatively high cooperativ- 
ity by the calorimetric criterion. Figure 10 shows that 
the Go model enthalpy distribution is very different from 
that of models with much lower cooperativities, such as 
the 2-letter model of Socci and Onuchic. 44 The enthalpy 
distribution of the 2-letter model in Figures 3a and 4 is 
bimodal — the lower mode peaks at the ground-state na- 
tive enthalpy (—84) and encompasses enthalpies < — 77, 
whereas the higher mode has a shifting peak, correspond- 
ing to a temperature-dependent variable enthalpy distri- 
bution in the denatured ensemble (Fig. 5 of Ref. 72; see 
above). In contrast, the denatured enthalpy distribution 
of the Go model consists of two widely separated peaks 
(Figure 10), the lower one is at H = —54 and the higher 
one is around H = —6 to —4. Together with the native 
population at H = —57, these give rise to a trimodal 
distribution of enthalpy. (The native peak is not shown 
in Figure 10.) 

The data in Figures 7 and 10 implies that the heat 
denaturation of the Go model takes place in the follow- 
ing manner. At low temperatures, T < 0.5 for example, 
> 95% of the chain population is in the single native 
conformation (Figure 3d). As temperature is raised to 
T = 0.65 - 0.70, a fraction of the native population 
is transferred to a group of low-enthalpy conformations 
with iJ's around —54 (Figure 10). There is an enthalpy 
(energy) gap of 3 units between the ground state and 
the lowest-enthalpy (H = —54) nonnative conforma- 
tions. Using MC histogram techniques, we estimated 
that there are ~ 10 5 nonnative conformations with 
H < —44. (For this Go model, the number of native 
contacts Q = —H.) The heat capacity associated with 
these initial thermal transitions is small in comparison 
with the heat absorption peak because of the relatively 
narrow enthalpy differences between the native and the 
low-enthalpy nonnative conformations. As temperature 
continues to increase to s=s T x / 2 = 0.75, chains start to 
unfold substantially, and a concentration of population 
at very high enthalpies (H w —6) begins to develop. 
This temperature coincides with the sharp peak of the 
heat capacity function (Figure 7, upper panel), which 
reflects the large-enthalpy thermal transitions from both 
the single ground-state conformation (H = —57) and 
the low-enthalpy nonnative conformations (H w — 54 - 
—40) to the large number of high enthalpy conforma- 
tions around H « — 6. There are non- vanishing chain 
populations at enthalpy levels intermediate between the 
two nonnative peaks, but they are not appreciable at 
any temperature. When the temperature is raised fur- 
ther to T = 0.83 - 0.95, the population at the single 
ground-state and the low-enthalpy nonnative conforma- 
tions greatly diminishes and practically all the chains 
have enthalpies above H = —16. 



Why is the Go Model More Cooperative Than Others? 

Several features of this process contribute to the Go 
model's relatively high cooperativity. First, unlike the 
2-letter model discussed above, the population peak of 
high-enthalpy conformations is quite insensitive to tem- 
perature: it shifts by merely « 2 enthalpy units, from 
H rj —6 to —4, when the temperature is increased from 
T = 0.75 to 0.95 (Figure 10). Second, unlike the 20-letter 
models whose single ground-state conformational popu- 
lations become < 0.1 when the temperature is raised to 
Tmax (Figures 6, 9, see above), the population of the 
single-conformation Go-model ground state remains sub- 
stantial (w 0.3) at the peak of the heat capacity func- 
tion. In fact, all three transition midpoint temperatures 
are well within the peak region of Cp for the Go model. 
And among the models tested, it is the one with both 
Ti/2 and T d closest to T max — within 1.4% and 0.4%, 
respectively (Figure 7, upper panel). 

These observations rationalize certain differences in co- 
operativity between models. For instance, the Go model 
is more cooperative than the 2-letter model in Figure 4 
by all k measures in Table II. This is because the Go 
model's bimodal distribution of nonnative enthalpies 
(i.e., the denatured part of an overall trimodal distri- 
bution) implies that a larger variance in H is possible, 
hence a higher peak value for Cp [Eq. (2)], and therefore 
a larger K2, than the 2-letter model with a single shifting 
broad distribution of denatured enthalpies. The bimodal 
denatured enthalpy distribution of the Go model also 
means that the average denatured enthalpy near T\/2 
should be approximately one half of the entire range of 
possible enthalpy variations. Hence Ko should be s=s 0.5 
(Table II indeed gives «o = 0.54.) This is higher than 
the Ko of the 2-letter model because the latter's dena- 
tured state is dominated by low-enthalpy conformations 
at its Ti/2- The Go model is more cooperative than the 
20-letter model in Figure 6. For the K2 measure, it is be- 
cause at T max the Go model has s=s 3 times as much chain 
population [N] in its single ground-state conformation 
as the 20-letter model. The highly specific, teleological 
interactions of the Go model also lead to much smaller 
probabilities for intermediate enthalpies. These factors 
translate into the possibility of having a larger variance 
in enthalpy distribution, thus a higher peak Cp value, 
and hence a higher K2 for the Go model than for the 
20-letter model. 

Summary of Analysis With No Baseline Subtractions 

The analysis above has shown that none of the models 
tested is calorimetrically two-state, though there are wide 
variations in their deviation from being so. For models 
with relatively high cooperativities such as the 36mer 
20-letter model and the 48mer Go model, this conclusion 
is still somewhat tentative because baseline subtraction 
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schemes 22 ' 23 are yet to be explored (see below). These 
schemes can lead to effectively higher k's (Ref. 22). How- 
ever, for models that deviate far from AH v n/ AH ca \ rts 1 
for all van't Hoff enthalpies considered, in particular the 
modified HP and short sidechain models, the analysis 
carried out so far is already quite sufficient in establish- 
ing that they are not good thermodynamic models for 
real calorimetrically two-state proteins. 

It is noteworthy that the present three-dimensional 
48mcr Go model is significantly more cooperative by 
the K2 criterion (= 0.87) than a two-dimensional 18mcr 
Go model studied previously (k 2 — 0.64). 23 Apparently, 
the longer chain length, the ability to form a three- 
dimensional core, and even the particular fold topology 
of the present Go model might have contributed to its 
higher calorimetric cooperativity. These factors need to 
be better elucidated. As we have emphasized, the inter- 
actions in Go models are highly artifical as they arc not 
based explicitly on a set of plausible microscopic phys- 
ical interactions. But Go model results are nonetheless 
instructive as they may highlight intrinsic limitations 
to what can be achieved by contact interactions. At 
least in the context of an underlying flexible polymer 
model, the above observations on all six models suggest 
that there always exists conformations with enthalpies 
(energies) close to the ground state, even when confor- 
mational distribution is governed by the highly specific 
Go potential. This raises the question as to whether it 
is natural to group them together with the ground-state 
conformation 46 to define a multiple-lattice-conformation 
native state as advocated by the authors of 20-letter 
models. 48,49 As will be seen below, this is a substantive 
physical question, not merely an issue of semantics. In 
fact, it is directly relevant to gaining a better physical 
understanding of baseline subtraction and devising more 
appropriate means to compare model predictions with 
calorimetric experiments. 

Effects of discarding a part of model specific heat 
capacity to mimic experimental baseline subtrac- 
tions. 

Physical Meaning of Baseline Subtractions 

As a first approximation, we have so far assumed, as 
in a previous study, 23 that the heat capacity functions 
predicted by simple protein lattice models are directly 
comparable to the standard "transition part" of exper- 
imental excess heat capacity function. The latter were 
obtained from calorimetric data by subtracting a sig- 
moidal weighted baseline after first subtracting the buffer 
baseline. 23 ' 36 ' 41 This follows from the conventional exper- 
imental interpretation 33 ' 36 ' 51 that only the peak region 
of Cp involves appreciable heat capacity contributions 
from thermal transitions between conformations that are 
both structurally and cnthalpically significantly different 



from one another. In this conventional view, by sub- 
tracting the baselines, the heat capacity contributions 
discarded were essentially only those from solvation ef- 
fects and small-amplitude motions of the protein, i.e., 
contributions that are regarded as unimportant in ac- 
counting for significant conformational changes. This as- 
sumption also underlies the standard empirical approach 
of using temperature-independent solvent accessible sur- 
face areas for both the folded and the unfolded states 
of a protein in thermodynamic analyses of calorimetric 
data. 33,36 However, this picture does not correspond ex- 
actly to the properties of polymer protein models, which 
invariably predict a non-negligible heat capacity contri- 
bution from conformational transitions well above the 
peak Cp transition region, though the amount of this 
contribution varies from model to model (see below). 

There are other reasons to believe that the real physical 
situation may be more complex than the picture implied 
by our first approximation and conventional empirical in- 
terpretation of calorimetric data. Bond vector motions 
measured by NMR spin relaxation indicate that protein 
backbone fluctuations contribute 8-14 cal mol _1 K _1 
per residue, 80,81 and thus account for ~ 20% of the heat 
capacity of an unfolded protein. On the other hand, sim- 
ilar measurements on the folded state of two proteins 
suggest that backbone fluctuations on average contribute 
only 0.5 cal mol _1 K _1 per residue, and account for <~ 1% 
of the heat capacity of the native state. While the con- 
nection between NMR-measured bond vector motions 
and conformational diversity remains to be better eluci- 
dated, the huge difference in heat capacity contribution 
from backbone motions between the folded and unfolded 
states strongly suggests that the possibility of enthalpic 
transitions between structurally dissimilar conformations 
in the denatured ensemble cannot be neglected, and that 
conventional baseline subtractions might have discarded 
heat capacity contributions from these transitions. 

More recently, a molecular dynamics simulation study 
using implicit solvent interactions also suggests that in 
addition to differences in solvation effects, there are sig- 
nificant heat capacity contributions to the difference be- 
tween native and denatured baselines from noncovalent 
intraprotein interactions. 38 While the heat capacity con- 
tributions from model vibrational motions of the covalent 
bonds 82 are essentially the same in the native and the de- 
natured states, these simulations suggest that noncova- 
lent interactions change more with temperature in non- 
native conformations than in the native state. 38 Owing 
to limited sampling, large numerical uncertainties were 
reported in this molecular dynamics study. Nonetheless, 
its prediction that on average non-solvation intraprotein 
interactions account for <~ 71% of the heat capacity dif- 
ference between native and denatured baselines (Table 2 
of Ref. 38) appears to be consistent with the NMR exper- 
iments described above: If we perform a rough estimate 
based on cytochrome c data (Ref. 33), and take ~ 16-23 
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cal mol _1 K _1 per residue to be typical native-denatured 
baseline differences, the NMR results 81 suggest that <~ 50 
- 70% of this difference may originate from the difference 
in backbone motions in the native vs. the denatured 
state, which is in the same range as the average molecu- 
lar dynamics result. 

From a polymer perspective, it is also intuitive to 
expect non-vanishing heat capacity contributions from 
thermal transitions between conformations at different 
cnthalpic levels with significant structural differences 
even at temperatures above the peak Cp region. Given 
the immense diversity in conformational structures, it is 
physically quite inconceivable how enthalpic diversity in 
the denatured ensemble can be entirely eliminated such 
that it behaves as if all conformations occupy only a sin- 
gle enthalpy level, which would have meant that all in- 
traprotein solvent-mediated interactions in the denatured 
ensemble were exclusively entropic. 

Among the lattice protein models evaluated here, in 
which we have taken all interactions to be enthalpic for 
simplicity, even the heat capacity function of the Go 
model with relatively high calorimetric coopcrativity has 
a long high-temperature tail (Figure 7, upper panel). 
This indicates that for this model, non-vanishing contri- 
butions to Cp from conformational transitions are not 
negligible at high temperatures. A relatively long (na- 
tive) tail extending to temperatures far lower than the 
peak Cp region is also present for the two 20-letter mod- 
els (Figures 6, 9, upper panels). On the other hand, in 
conventional analyses of calorimetric data, no such long 
tails are ever present to be considered in the transition 
part of the excess heat capacity function obtained from 
baseline subtractions. 36 ' 41 ' 50 ' 51 ' 54 Even in calorimetric 
analyses of non-cooperative nonprotein homopolymers, 83 
their existence is routinely precluded by empirical base- 
line subtraction techniques. This mismatch between 
theoretical predictions and standard transition excess 
heat capacities necessitates a closer examination of the 
correspondence between the physical pictures emerging 
from polymer protein models and the conventional inter- 
pretation of calorimetric experiments. 

Applying Baseline Subtractions to Model Heat Capacities 
Can Result in Higher Predicted Calorimetric Cooperativ- 
ities 

We now explore the effects of using an ad hoc empir- 
ical procedure, similar to what has been carried out on 
experimental calorimetric data, to eliminate both the na- 
tive and denatured tails in model Cp functions plotted in 
the upper panels of Figures 4-9. Physically, this exercise 
was motivated by our recognition, based on the evidence 
above, that conventional calorimetric baseline analyses 
might have substracted out "tail" contributions that are 
relevant for the evaluation of polymer model predictions. 
Hence, as an effort to put theoretical predictions on the 



same footing as the (no-tail) experimental transition ex- 
cess heat capacities, we now perform baseline subtrac- 
tions on model data to eliminate their tail contributions. 
We do expect, nonetheless, that the corresponding "tail" 
contributions in real experimental data are only a mi- 
nor part of the heat capacity contributions discarded by 
conventional baseline subtraction on calorimetric mea- 
surements. There are reasons to expect that conventional 
interpretation is at least partially correct in that a major- 
ity of the contributions subtracted by standard baseline 
analyses are indeed heat capacity contributions from sol- 
vation effects and small- amplitude protein motions. For 
instance, the molecular dynamics simulation discussed 
above estimated 38 that only <~ 11% of native-state heat 
capacity came from non-covalent interactions. 

Following standard experimental procedures, 50,51 (see 
also Ref. 22) baselines are constructed as plausible linear 
extrapolations from low temperature and high temper- 
ature parts of the Cp function to its peak region; they 
are referred to as native (low temperature) and dena- 
tured (high temperature) baselines. These constructions 
are depicted in Figure 11 and the upper panels of Fig- 
ures 12 and 13 for the six lattice protein models we have 
been considering. More details are described in the cap- 
tion for Figure 11. Baseline subtraction has two oppo- 
site effects on the predicted calorimetric cooperativity. 
On one hand, it decreases the value of calorimetric en- 
thalpy, because some areas under the Cp curve are ex- 
cluded from the integration for AH ca \. This tends to 
increase the AH v h/ AH ca \ ratio. On the other hand, 
it decreases the effective peak value of heat capacity. 
This tends to decrease the AH v n/ AH ca \ ratio, as Ai? V H 
is proportional to the effective peak Cp value or its 
square root. Here we define an effective post-baseline- 
subtraction AH v n/ AH C& \ ratio by substituting the new 
effective peak heat capacity and effective calorimetric en- 
thalpy into the expression for k 2 in Eq. (6): 
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Table III shows that for all six models, baseline subtrac- 
tions lead to increases in apparent (effective) calorimet- 
ric cooperativity. However, both the modified HP model 
(«2 = 0.41) and the short 20-letter sidechain model 
(4 s) = 0.54) remain very far away from being calorimet- 
rically two-state, despite some improvements. On the 
other hand, the effective calorimetric cooperativities of 
the 2- and 3-letter models increase dramatically (from 
K2 = 0.36 and 0.46 to ~ 0.94) after large areas (thick 
denatured tails) under their C p functions have been sub- 
tracted out (Figure 11a and upper panel of Figure 12). 

- (s) 

Remarkably, the Go model's n 2 ' of 1.00 now meets the 

(s) 

experimental standard. The 36mer 20-letter model's k 2 ' 
also rises above 0.94 (upper panel of Figure 13). We will 
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use the 27mer 3-letter and the 36mer 20-lcttcr models to 
discuss the physical implications of these enhancements 
of apparent calorimetric cooperativity by baseline sub- 
tractions. 

Nonlinear "Formal Two-State" Baselines and Multiple- 
Conformation Native States 

Recently, Zhou et al. made a pertinent observation 
that any density of states can be formally decomposed 
into two arbitrary "states," and that its thermal behav- 
ior made to satisfy the calorimetric two-state criterion 
if one is willing to introduce (non-standard) nonlinear 
baseline subtractions. 22 To gain further insight into the 
physical meaning of baseline subtractions, we found it 
instructive to contrast and compare the present empir- 
ical analysis with their construction. Here is a brief 
summary of their formulation (in our notation). Any 
partition function Q can be written as a sum of a pair 
of partition functions for two "states," denoted here by 
"0" and "1"; viz., Q(T) = Qq{T) + Qi(T). Let (C P ) 
and (Cp)i be the individual heat capacities of the two 
states, computed from Qo and Qi respectively, and T m 
be the midpoint temperature at which the population 
in the two states are equal, i.e., Qo(T m ) ~ Qi(T m ). 
Zhou et al.'s baselines are defined by the individual heat 
capacities: (C P ) (T) for T < T m and (C P )i(T) for 
T > T m . Naturally, a calorimetric enthalpy Aj_ff ca i is de- 
fined to be the area between the C p curve and this base- 
line, and a midpoint heat capacity value AjCp(T m ) = 
C P (T rn ) - [(C P )o(T m ) + (C P ) 1 (T m )]/2. A population- 
based van't Hoff enthalpy AlH v n(T) is then computed 
using Eq. (3) above with 6 = Qi{T)/Q(T). Zhou et 
al. showed that in general AjiJ V H(T m ) = Ao_ff ca i = 
Ak B Tl AiCp(T m )/Ai7J cal [Eqs. (3), (4), (12) and (15) 
of Rcf. 22]. This identity, which corresponds to kq — 
(ki) 2 = 1 if T1/2 is formally replaced by T m [see Eq. (6)], 
means that the calorimetric two-state condition is always 
satisfied with this particular choice of baselines. 

We have computed baselines for the six models accord- 
ing to this recipe 22 and included them as dotted curves 
in Figure 11 and the upper panels of Figures 12 and 13. 
(In the discussion below, they are referred to simply as 
"nonlinear baselines.") For models that assume a single- 
conformation native state, 44-47 Qo = Qn and Qi = Qd- 
For the two 20-letter models, Qo is constructed as the 
partition function for the multiple-conformation native 
state defined by the original authors, 48,49 while Qi is de- 
fined to account for the rest of the conformations. These 
nonlinear "formal two-state" baselines are conceptually 
enlightening (see below), however, it is our view that 
they should not be used directly to evaluate protein mod- 
els. The first reason is logical — since by construction 
they always lead to perfect agreements with the calori- 
metric two-state condition, using them on model sys- 
tems would abolish the substantive physical question of 



whether polymer protein models conform to the experi- 
mental calorimetric requirements. Second, and more im- 
portantly, such baselines had not been used by exper- 
imentalists to analyze calorimetric data. For all cases 
studied here, these nonlinear baselines invariably sub- 
tract more from the peak Cp region than conventional 
linear or weighted baselines (Figures 11-13). This means 
that using these nonlinear baselines on model Cp func- 
tions would most likely lead to an effective heat capacity 
function that does not physically match the experimen- 
tal transition excess heat capacity function, and thus 
would make it extremely difficult to conduct meaningful 
comparisons between theory and experiment. 23 

Much insight can be gained, however, by comparing 
the nonlinear baselines with the ad hoc empirical linear 
baselines we used. As the nonlinear baselines of Zhou et 
al. 22 are guaranteed to produce perfect (apparent) calori- 
metrically two-state behaviors, it is not unreasonable to 
expect that if the linear baselines are close to the non- 
linear baselines, the apparent calorimetric cooperativity 
predicted by the linear baselines would be high, and vice 
versa. This appears to hold for five out of our six cases: 
Relatively high apparent calorimetric cooperativities re- 
sulted from linear baseline subtractions for the 2-letter, 
3-letter, and 36mer 20-letter models (Table III); and as 
expected their linear and nonlinear baselines are quite 
close (Figure 11a, upper panels of Figures 12 and 13). On 
the other hand, the nonlinear baselines are very far away 
from the empirical linear baselines used for the modi- 
fied HP and the 15mer sidechain models. Not surpris- 
ingly, their apparent calorimetric cooperativities remain 
low even after linear baseline subtractions (Figures 11c 
and d and Table III). 

The only exception is the Go model (Figure lib), for 
which the nonlinear denatured baseline amounts to a 
dominant contribution to the overall heat capacity, and is 
very far from the empirical linear denatured baseline. Yet 
the Go model is the most cooperative among the models 
we evaluated, especially after linear baseline subtractions 
(Table III). The reason for this behavior is because we 
have taken the denatured state of this model to be the en- 
semble that encompasses all non-ground-state conforma- 
tions. And since the enthalpy distribution among these 
nonnative conformations is bimodal (Figure 10), the non- 
linear denatured baseline, which is the denatured heat 
capacity (Cp)i = (Cp)d, involves large thermal transi- 
tions between the two denatured peaks. This accounts for 
its high magnitudes. In addition, owing to the adoption 
of a single-conformation native state, there is no nonlin- 
ear native baseline in the present consideration of this Go 
model. On the other hand, if a. multiple-conformation na- 
tive state were adopted to incorporate the low-enthalpy 
conformations that are now being classified as denatured, 
it would have resulted in nonlinear baselines for both the 
alternately defined native and denatured states. Adop- 
tion of such a multiple-conformation native state would 
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lead to the elimination of contributions to (Cp)i from 
large thermal transitions between the two enthalpy peaks 
in Figure 10, and hence a nonlinear denatured baseline 
with much reduced magnitudes. It is expected that the 
nonlinear baselines would then be much closer to the em- 
pirical linear baselines used in our analysis, and would 
give rise to a situation much more similar to the 36mcr 
20-letter case, to be discussed below. 

For the 36mer 20-letter model (Figure 13), the (low 
temperature) nonlinear native baseline derived from a 
multiple-conformation definition of the native state 
is almost identical to the empirical linear native base- 
line. By construction, a nonlinear native baseline ac- 
counts for the heat capacity contribution from thermal 
transitions among the multiple conformations of the na- 
tive state. Therefore, when an empirical linear native 
baseline essentially overlaps a particular nonlinear na- 
tive baseline, and we use the empirical linear baseline 
for subtraction, we are effectively (empirically) adopting 
the multiple-conformation native state that underlies the 
construction of the given nonlinear native baseline. More 
generally, when empirical linear baselines for both the na- 
tive and denatured states overlap significantly with their 
nonlinear counterparts for a particular formal two-state 
definition, 22,49 and T max ss T m , as in this particular 20- 
letter case (Figure 13, upper panel), the empirical linear 
baseline subtraction scheme may be viewed as an empiri- 
cal (approximate) adoption of the given formal two-state 
definition for the native and denatured states. Hence, 
it follows from the "formal two-state" consideration 22 
that such an empirical subtraction would lead to closer 
conformity to the calorimctric two-state criterion as ob- 
served here. 

The 3-Letter (3LC) Model Predicts Significant Post- 
Denaturational Chain Expansion — Comparison with 
SAXS Experiments 

We now broaden our attention to other thermody- 
namic properties. Obviously, adherence to the calorimct- 
ric two-state criterion is only one of many physical prop- 
erties of real two-state proteins. Therefore, to ascertain 
whether a model with high apparent calorimctric cooper- 
ativity is adequate for generic properties of real two-state 
proteins, we should also subject its other properties to 
further experimental evaluation. In this spirit, we now 
consider the 3-letter model in more detail. This model 
uses a single-conformation native state, 45 and its appar- 
ent calorimctric cooperativity is quite high after empir- 
ical baseline subtractions, = 0.952. Its behavior is 



expected to be representative of lattice protein models 
that are based on additive pairwise contact energies and 
have small numbers of monomer (residue) types in their 
alphabets. For instance, in many respects the properties 
of the 3-letter model are similar to the 2-letter model, 
which also attains a high apparent calorimetric coopera- 
tivity after baseline subtractions (Table III) . As discussed 
above, the 3-letter model is instrumental in Onuchic et 
al.'s Tf/Tg = 1.6 estimate for small a-helical proteins. 59 

One thermodynamic property accessible to experimen- 
tal determination is the dimension of a protein, measured 
by its average (i.e., mean-square) radius of gyration R g 
as a function of temperature. Using the MC histogram 
method, we have computed this function for the 3-letter 
model (Figure 12, lower panel). It shows a very gradual 
post-denaturational increase (for T > T]/2,T max w 1.5): 
Average R g is w 30% larger at higher temperatures than 
its value at the high-temperature edge (T w 1.8) of the 
peak Cp transition region. 

It appears that this prediction is signficantly different 
from experimental observations. Sosnick and Trewhella 84 
have used small-angle X-ray scattering (SAXS) to mon- 
itor the temperature dependence of R g of ribonuclease 
A, one of the first few proteins shown to be calorimet- 
rically two-state. 50 They observed no systematic post- 
denaturational increase of R g under both reducing (no 
disulfide bonds) and non-reducing conditions. Under re- 
ducing conditions (which more closely corresponds to the 
present lattice chains without crosslinks), the transition 
temperature w 51°C. Sosnick and Trewhella observed no 
continuous chain expansion at temperatures higher than 
the relatively narrow transition region at ~ 45 - 54° C. 
Indeed, there was even a slight decrease in R g when the 
temperature reached 74° C. More recently, Hagihara et 
al. 85 used solution X-ray scattering to show that the tem- 
perature dependence of R g during heat denaturation of 
ribonuclease A and cytochrome c can be well approxi- 
mated by a strictly two-state model. Plaxco et al. 64 used 
SAXS to monitor the dependence of R g of protein L on 
guanidinc hydropchloride concentration. They also did 
not observe any trend of post-denaturational expansion. 

The significant post-denaturational chain expansion 
predicted by the 3-letter model is directly related to 
a substantial heat-induced shifting of its denatured 
enthalpy distribution, as evident from its thick high- 
temperature Cp tail. This behavior is similar to that 
noted above for the 2-letter model. The discrepancy 
between this 3-letter model's R g prediction and experi- 
ment 4 suggests that, in spite of its relative high appar- 
ent calorimetric cooperativity after empirical baseline 



4 Our conclusion here is based on the fact that the 3-letter 
model R g continues to increase as the temperature is raised 
above the peak Cp transition region, and that this behav- 
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subtractions, it suffers from essential deficiencies clS Si 
model for real two-state proteins because of the broad 
and shifting enthalpy distribution among its denatured 
conformations. 5 This observation is consistent with the 
proposal above that the ratio Tf/T g = 1.6 deduced from 
the 3-letter model may most likely be an underestimate 
for real two-state proteins. 

Multiple- Conformation Native State and Non-Native 
Contacts in the 20-Letter Model 

Finally, we examine in more detail the thermodynamic 
of the 36mcr 20-letter model (Figures 13-15). This model 
has an apparent calorimetric cooperativity {k^ — 0.943) 
similar to that of the 2- and 3-letter models (Table III). 
Its model potential is the basis of a large body of in- 
teresting work; 11 and is expected to be representative of 



lattice protein models that are based on additive pair- 
wise contact energies, with a large but finite number of 
monomer types in its alphabet, and a substantial fraction 
of the contact interactions being repulsive. 19 ' 72,77 Here it 
also serves to exemplify models with a multiplc-lattice- 
conformation native state. 6 

The lower panel of Figure 13 shows how the fold- 
ing/ denaturation transition of this model chain is tracked 
by different thermodynamic order parameters, which 
may correspond to different experimental probes. The 
population [N] of the single ground-state conformation 
begins to drop rapidly well below the Cp peak temper- 
ature T max , whereas T max essentially coincides with the 
midpoint temperatures for all other probes shown. This 
is consistent with the observation 19 that in general the 
midpoint temperature for [N] is lower than that for (Q). 
The measure P(Q > 20) shows the sharpest transition, 
as it is a binary "formal two-state" order parameter for 



ior is not observed in experiments. Following this logic, if the 
subtraction scheme in Figure 12a is used to ensure high calori- 
metric cooperativity, there should be no appreciable increase 
in model R g for T > 2.2 if the prediction is to be consis- 
tent with experiment. But this is not the case (Figure 12b). 
We believe this reflects the main physical difference between 
this model and experimental observation. We note, however, 
that a direct mapping of temperatures between the 3-letter 
model results and experiment is not possible because they 
are systems of very different sizes. For instance, the peak Cp 
transition regions for real proteins cover a range of 10 - 20 
degrees (Refs. 50, 84). However, if we choose an energy unit 
to equate the 3-letter model T max « 1.51 with the ribonu- 
clease A midpoint temperature of 51°C, the model peak Cp 
transition region would translate into a temperature range of 
~ 130 degrees. 

5 Incorporation of empirical baseline subtractions does not 
change our previous conclusion that additive hydropho- 
bic interactions are insufficient for calorimetric two-state 
cooperativity. For the two-dimensional HP, Go and HP+ 
models analyzed in Ref. 23, application of empirical baseline 
subtractions similar to the one used here is not sufficient for 
bringing their apparent van't Hoff to calorimetric enthalpy 
ratio close to unity. However, baseline subtractions are able 
to take the two new models introduced in Ref. 23 with co- 
operative interactions much closer to apparent calorimetric 
two-state behaviors: After subtraction, «^ = 0.90 for the 
new cooperative model with pure enthalpic interactions, and 

(s) 

n. 2 — 0.97 for the model with entropic HH interaction in 
Ref. 23. The present consideration of the 3-letter model also 
generalizes the previous observation that HP-like nonspecific 
pairwisc additive interactions are insufficient to account for 
certain generic thermodynamic properties of real two-state 
proteins. 

6 Note, however, that a single-conformation native state with 
a single ground-state energy _En was used by the author of 
Ref. 11 to define the folding transition temperature Tf for a 
different lattice model in Ref. 76. 
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which a chain conformation can take only one of two val- 
ues: either it is native (has Q > 20), or not (Q < 20). 7 
The order parameter (Q)/Qn shows a broader transition 
because there are 40 possible Q values for this 36mer 
chain. For this model, the temperature dependence of 
(x) correlates almost perfectly with that of (Q) (see in- 
set in the upper panel of Figure 13). These observations 
illustrate that the sharpness of a transition 8 can vary 
significantly depending on the probe (order parameter), 
whereas the calorimetric criterion is a more fundamental 
measure of cooperativity 33 because it directly probes the 
underlying density of enthalpic states. 23 

This 20-lettcr model is a better mimic of real two-state 
proteins than the 3- letter model in certain respects. For 
instance, its R g shows no significant post-denaturational 
expansion and therefore enjoys better agreement with 
the SAXS experiments discussed above (Figure 15, lower 
panel). We now briefly touch on two issues that are 
likely to be relevant in future assessments of the 20- 
letter model's conformity to experimental two-state be- 
havior, (i) Structural diversity of the native state: The 
20-letter model allows for significant conformational vari- 
ation (Figure 14). For this particular sequence, this leads 
to the prediction that the native state has a higher heat 
capacity contribution from main-chain-like motions than 
the fully unfolded state, as is evident from the higher Cp 
value in the native tail region than the denatured tail 
region (Figure 13, upper panel). 8 However, this does not 
appear to agree with the NMR experiments discussed 
above. 81 (ii) The prevalence of nonnative contacts: For 
this model, the number of nonnative contacts under- 
goes a sharp transition near the heat absorption peak 
(Figure 15, upper panel). The average number is > 3 at 
Tmax, reaches a peak ss 6 at a temperature slightly higher 
than T max , then settles down gradually at a relatively 
high average number of ss 4.5 for the high-temperature 
unfolded state. Recent NMR experiments show that non- 
native interactions can exist in the compact denatured 
states of some proteins, 87,88 but this phenomenon is not 
universal. 89 If prevalence of nonnative contact is not a 



generic property of denatured states of real two-state 
proteins, it would be important to ascertain whether 
the high number of nonnative contacts observed in this 
particular sequence reflects a general feature of its un- 
derlying 20-lctter contact potential. 

Concluding Remarks 

We have examined the implications of calorimetric 
two-state cooperativity and other experimentally deter- 
mined thermodynamic properties on a protein's density 
of enthalpic states. 23,90 In general, they require a nar- 
row enthalpy distribution among the denatured confor- 
mations, as has been recently proposed. 23 Energy land- 
scape theory 9 has allowed us to make a connection be- 
tween calorimetric two-state cooperativity and folding 
kinetics. Using an analytical random-energy energy 
model, we showed that the folding landscape parame- 
ter Tf/Tg w 6.0, which is significantly higher than a 
previous estimate of ~ 1.6 for small (~ 60-residue) ex- 
helical proteins. 59 Experimental observations of single- 
exponential folding without kinetic trapping for a num- 
ber of small single-domain proteins 50-80 residues long 
with no disulfide bonds 62-67, 91-93 is consistent with ei- 
ther Tf/Tg w 1.6 or 6.0. This is because for pro- 
teins with Tf < 100°C, both ratios imply a T g far lower 
than any temperature at which folding kinetic experi- 
ments have been conducted (T g < 233K or < 62K). 
In general, the present random-energy-model results also 
imply that folding of all calorimetric two-state proteins 
should not be affected by kinetic traps. However, this 
does not appear to agree with experiment. Notable 
counter-examples include the calorimetrically two-state 33 
lysozyme 94,95 and cytochrome c. 96 This underscores an 
intrinsic limitation of the random-energy-model method 
because it is not a chain-based approach and does not 
address sequence-specific properties. 

We have evaluated six lattice protein models against 
the calorimetric two-state criterion. The initial stage 
of our analysis treated the native state as a single lat- 



7 Using MC histogram technique, we estimated that there 
are ~ 4.4 x 10 9 different conformations in this 20-letter se- 
quence's Q > 20 native state. This is > 10 4 times more than 
the ~ 10 5 low-enthalpy conformations in the Go model (see 
above), notwithstanding a 48mer Go model's total number of 
conformations is « (4.68) (48 - 36) = 1.1 x 10 s times that of this 
36mer model. 86 This shows that if a multiple-conformation 
native state were to be defined for the Go model, its confor- 
mational diversity would be much smaller than the one in this 
20-letter model. 

8 A recent Go-like continuum three-helix bundle model also 
predicts a higher heat capacity for the native state than the 
denatured state. 22 
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tice conformation. This was based on the assumption 
in conventional analyses of calorimetric data, which have 
identified the native state as the structure deposited in 
the Protein Data Bank. 33 ' 36 Therefore, as in a previous 
investigation, 23 we first evaluated AH v n/ AH ca \ ratios di- 
rectly from the model Cp functions, without any baseline 
subtractions (i.e., the baseline was first taken to be sim- 
ply the Cp = axis). In this evaluation, none of the 
models came close to meeting the calorimetric two-state 
standard. This is consistent with our previous conclu- 
sion, based on two-dimensional models, that when the 
native state is considered to be consisting of a single con- 
formation, pairwise additive contact interactions are in- 
sufficient for calorimetric two-state cooperativity. 23 

However, based on both theoretical and experimental 
considerations, principally data from NMR bond vector 
motion measurements, 81 we have come to believe that 
it would be profitable to explore using empirical linear 
(nonzero) baselines to subtract out "tail contributions" 
from model Cp functions so as to compare them on a 
more equal footing with experimental transition excess 
heat capacity functions. We have therefore taken the 
second step of incorporating empirical baseline subtrac- 
tions in our model evaluation. Analysis of a 20-lettcr 
lattice model indicates that subtracting a nonzero native 
baseline amounts to a re-definition of the native state. 
Physically, the empirical subtraction operation is roughly 
equivalent to (i) classifying more conformations as native, 

(ii) including their contributions in the thermodynamic 
properties of a multiple-conformation native state, and 

(iii) excluding thermal transitions among these multiple 
native conformations from contributing to the subtracted 
heat capacity function. 

After baseline subtractions, a Go model meets the 
calorimetric two-state standard. However, while the 
teleological Go potential is extremely useful for post- 
ing "what if" questions, 43 ' 46 whether and how it can be 
rationalized in terms of physically plausible interactions 
remains to be clarified. Among models with a finite al- 
phabet of residue types, the apparent AH V ^/ AH C& \ ra- 
tio for the 36mer 20-letter model is relatively high after 
empirical baseline subtraction, though it still falls short 
of meeting the high experimental standard for two-state 
cooperativity. (Its (k( s )) 2 = 0.89, the corresponding ra- 
tio for real two-state proteins w 0.96.) Other models 
with smaller alphabets or shorter chain lengths either 
have low AH v n/ AH ca i ratios or exhibit significant post- 
dcnaturational chain expansions that appear to contra- 
dict X-ray scattering experiments. 84 ' 85 This suggests that 
a relative high level of interaction heterogeneity — as 
characterized by a larger alphabet 11 ' 97-99 and the pres- 
ence of repulsive interactions 19 ' 72,77 — is necessary for 
more proteinlike thermodynamic cooperativity. 

The low-temperature tails in the Cp functions of the 
36mer 20-letter and the Go models before baseline sub- 
tractions are direct consequences of the low-enthalpy 



conformational diversity embodied in the multiple- 
conformation native state of the 36mer 20-letter model, 
and the existence in the Go model of ~ 10 5 conformations 
with enthalpies very close to its ground state. This sug- 
gests that, for flexible heteropolymer models that achieve 
high apparent calorimetric cooperativity with only pair- 
wise additive contact interactions, the native state ef- 
fectively defined by an empirical native baseline would 
inevitably involve significant conformational fluctuation 
(as modeled here by different discrete lattice conforma- 
tions). If one assumes that this model prediction cap- 
tures at least partially the properties of real proteins, this 
would imply that the a posteriori experimental calori- 
metric "native state" defined operationally by empirical 
baseline subtractions may involve significant conforma- 
tional diversity, and therefore may be qualitatively dif- 
ferent from the a priori single-conformation native state 
used in conventional interpretation. 33 ' 36 

One of the main goals of this study was to ascertain the 
degree to which proteinlike thermodynamic cooperativity 
can be achieved by simple models, especially the question 
as to whether pairwise additive contact interactions are 
sufficient. This is part of an effort to delineate the extent 
to which existing simple protein models capture generic 
protein properties. 37 This issue is also relevant to a re- 
lated question regarding the sufficiency of contact inter- 
actions for protein structure prediction. 100 Our analysis 
of the 36mer 20-letter model is particularly instructive. 
Its apparent calorimetric cooperativity is relatively high 
after empirical baseline subtractions. However, how well 
does its predicted native conformational diversity match 
that in real proteins remains to be further investigated, 
especially in view of the apparent discrepancy between 
NMR main-chain bond vector motion measurements and 
the relative magnitudes of the native and unfolded heat 
capacities in this model. 

Conventional interpretation of calorimetric data has 
been premised on a single-conformation, X-ray crystal- 
structure-like native state. The present analysis suggests 
a new perspective that involves a higher degree of con- 
formational heterogeneity, namely (i) the possibility of 
a multiple-conformation native state, and (ii) the possi- 
bility that conventional baseline subtractions could have 
masked a non-negligible post-denaturational change in 
chain dimension driven by thermal transitions among de- 
natured conformations at different enthalpic levels. In 
this alternate scenario, the relationship between calori- 
metric two-state cooperativity and a protein's underly- 
ing enthalpic density of states becomes more complex. 
Nonetheless, if one characterizes the thermodynamics 
of real two-state proteins by both the calorimetric two- 
state criterion and the experimental observation 84 ' 85 that 
no significant post-denaturational chain expansion took 
place, one central aspect of the physical picture 23 re- 
mains essentially the same: For thermodynamically two- 
state proteins, there is no significant post-denaturational 
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shifting of the enthalpy distribution among the confor- 
mations of the denatured state relative to the average 
enthalpy of the (multiple-conformation) native state. On 
the other hand, a corresponding pre-denaturational shift- 
ing (i.e., under native conditions) docs not contradict 
the experimental observations. This is consistent with 
the multiple-state picture 101,102 emerging from native- 
state hydrogen exchange experiments, 103,104 as has been 
discussed. 23 However, it is noteworthy that the base- 
line analysis in the present work does raise the possi- 
bility that parts of the structural fluctuation revealed by 
native-state hydrogen exchange can in principle corre- 
spond to conformational diversities that have been oper- 
ationally absorbed into the baseline-defined calorimetric 
native state. 
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Appendix 

Statistical mechanics of a strictly two state model. 

Here we describe basic thermodynamics of a strictly 
two-state model, which may be viewed as the an — > 
limit of the random-energy model given by Eq. (7) above. 
The simplicity of this extreme case makes it useful for 
further elucidating the relationship among different mid- 
point temperatures and van't Hoff enthalpies in the anal- 
ysis of calorimetric cooperativity. The strictly two-state 
model is given by the partition function 



Q(T) = 1 + g D c 



-H u /(k B T) 



(Al) 



where g D in Eq. (7) is re-written as g D to highlight that 
we now consider a discrete rather than a continuous den- 
sity of states. 23 For this model, AH ca i — Hr>] and the 
average enthalpy 



(H(T)) 



l+.g D c-^/( fe B T ) ■ 
It follows that the specific heat capacity 

d(H(T)) _ H B 2 goC-'WC^) 



C F 



dT 



k B T 2 (l + ffD e-^/(fc B T)) 2 • 



(A2) 



(A3) 



This functional form gives a single maximum value for 
Cp at a certain T = T max . The relation between T max 
and the population midpoint temperature 



T\/2 



(A4) 



k B In # D 

may be determined as follows. First, we note that the 
slope of the specific heat function at the population mid- 
point 



dT 



Hr 



T=T, 



1/2 



2k B (T 1/2 y 



< 



(A5) 



This establishes Ti/ 2 > T max for a strictly two-state 
model. We then seek a good estimate of T max by at- 
tempting an approximate solution to the dCp/(dT) = 
condition — which is equivalent to the equation 



£-2 



(A6) 



£ + 2 ' 

where £ = — Hr>/(k B T lnax ). For lng u 3> 1, which is a 
reasonable assumption for proteins, as discussed in the 
text. 



<T 1/2 



(A7) 



and 



fc B [ln 9D + 4/(2 + ln 9D )] 

The last inequality follows from Eq. (A4) for T\i 2 
confirms the conclusion we have drawn from Eq. (A5). 
Finally, since by Eqs. (A2) and (A4) (H(T 1/2 )) = 
Aff cal /2, we have T d 
two-state model, 



7i/2. Therefore, for a strictly 



Td — Ti/2 > T n 



(A8) 



We now turn to the various van't Hoff to calorimet- 
ric enthalpy ratios considered in the text [Eq. (6)]. Ob- 
viously, by definition kq = 1 for the strictly two-state 
model. Moreover, by Eqs. (A3) and (A4), 



2T 1/2 yJk B C P (T 1/2 ) =H U = AH cal . (A9) 



Hence m = 
other hand, 



k 3 = 1 as well, because T x / 2 = Td- On the 



k 2 = 2T max ^k B C P (T max )/Hr> = y/l - 4(fc B T max /iJ D ) 2 < 

(A10) 

However, for proteinlike systems, H-q ^> k B T is expected 
for any T between 0° to 100°C, hence T 1/2 = T max 
and k 2 = 1 are very good approximations. For in- 
stance, if we use the parameters in the text for Ho and 
<7d, which were motivated by experimental data on CI2 
(Fig. 3 of Ref. 54), we get T 1/2 = 336.190K, whereas 
T max = 336.025K is only 0.17°C lower, and k 2 = 0.9997. 
Therefore, for a strict two-state model with these pro- 
tcinlike parameters, practically all three midpoint tem- 
peratures are identical, and all k's are equal to one. 
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Table I 



midpoint 


AHvH/A^ca! 


references 


AH vH /A#cal 


references 


1/ A 


0=[D] 


Ref. 23, Eq. (4) 








Kl 


Ref. 23 




Ref. 23 


T 

J- max 


K 2 


Ref. 40, Eq. (39) 


(«2) a 


Ref. 40, Eq. (38) 
Ref. 41, Eq. (21) 


T d 


K 3 


Ref. 50, Eq. (7) 


9={AH)/AH caX 


Ref. 51, Eq. (11) 
Ref. 22, Eq. (22) 



Table I. Different definitions in the literature for A7J v h/ AH ca \, the van't Hoff to calorimctric enthalpy ratio, 
^midpoint is the midpoint temperature of the given dcfinition(s); see Eq. (6) in the text. Equation numbers in the 
table are those in the example reference(s) in which a given formula is used or proposed. 9's are shown only for 
A-ffvu/Ai/cai's that follow directly from Eq. (4). Note that Ko, K2, («2) 2 , and (K3) 2 are equal, respectively, to the 
expressions "A# vH /A# cal ," "A# v c * p /Aff cal ," "A£^ p(a) /Aff cal ," an d "A^ p(a) '/Ai/ ca i" in Ref. 23. 
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Table II 



Model 


Ai/ ca l 


Aff vH /Aff cal 


n c 










(a) 2-letter (27mcr) 


68.5 


0.26 


0.32 


0.36 


0.24 


11.2 


(b) 3-letter (27mcr) 


73.9 


0.36 


0.43 


0.46 


0.31 


20.7 


(c) 20-lcttcr (36mcr) 


15.0 


0.10 


0.12 


0.67 


0.66 


38.9 


(d) Go (48mcr) 


55.2 


0.54 


0.78 


0.87 


0.87 


192 


(e) Modified "HP" (36mcr) 


35.1 


0.17 


0.23 


0.33 


0.31 


12.4 


(f) Sidcchain (15mcr) 


11.6 


0.05 


0.07 


0.38 


0.36 


5.69 



Table II. Calorimetric cooperativity of the lattice protein models in Figure 3. Thermodynamic quantities are 
deduced from Figures 4-9: kq involves the population-based van't Hoff enthalpy, 23 which can be readily read off from 
the (AH)u curves. «i, K2, and K3 [Eq. (6)] are deduced from the Cp functions, and AH ca \ is obtained by numerical 
integration of Cp over T. The Klimov-Thirumalai 48 cooperativity parameter fl c is calculated for these models and 
included for comparison; the present fl c — 5.69 is slightly different from the value 5.32 reported by Klimov and 
Thirumalai. 48 
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Table III 



Model 


T 

± max 


Cp, max 






^1! 


( s ) 


(a) 2-letter (27mcr) 


1.35 


80.6 


69.5 


22.6 


24.2 


0.932 


(b) 3-letter (27mcr) 


1.56 


117 


105 


32.0 


33.6 


0.952 


(c) 20-letter (36mer) 


0.282 


316 


294 


9.66 


10.3 


0.943 


(d) Go (48mer) 


0.764 


986 


965 


47.5 


47.3 


1.00 


(c) Modified "HP" (36mcr) 


0.558 


107 


102 


11.3 


27.8 


0.406 


(f) Sidcchain (15mer) 


0.268 


66.4 


59.9 


4.14 


7.75 


0.535 



Table III. Effects of baseline subtractions on the predicted calorimetric cooperativities of the six lattice protein 
models considered in this work: The effective van't Hoff to calorimetric enthalpy ratio (right column) is equal to 
Aif^j/A-H^j [Eq. (9)]. The definitions of all quantities tabulated and methods to determine them are described in 
the text, Figure 11, and upper panels of Figures 12 and 13. 
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Figure Captions 

Fig. 1 Densities of states g(H) of random energy mod- 
els. Each parabolic curve is \ng(H) from Eq. (7) with 
H B = 3 x 10 4 (vertical dashed lines), g D = 5.68 x 10 38 , 
as described in the text, and H is in units of k B . The 
k values of these curves, 0.6, 0.80, 0.95, and 0.98, quan- 
tify the different degrees of cooperativity of four models 
given here as examples, with standard deviations of de- 
natured enthalpy a H = 1800, 1350, 700, and 440fc B re- 
spectively. k 's are the population-based 23 AH v n/AH ca \ 
ratios. The horizontal dashed line highlights the fact that 
for these models it is possible for g(H) < 1; and the dot 
indicates that their unique native (N) states have zero en- 
thalpy [J-function in Eq. (7)]. Note that the logarithmic 
scale along the vertical axis implies that a 0.693 decrease 
in In g is equivalent to halving the value of g itself. Hence 
the distribution of g is much sharper than this logarith- 
mic plot might have otherwise conveyed. 

Fig. 2 Relationship among different calorimetric two- 
state criteria in the random energy models defined by 
Eq. (7). See text and Table 1 for definitions and refer- 
ences. Left column: (a) Midpoint transition tempera- 
tures and (b) van't Hoff to calorimetric enthalpy ratios, 
as functions of the standard deviation an of denatured 
enthalpy distribution, (b) shows k's vs. (Jr times a con- 
stant, so that the horizontal scale corresponds to Onuchic 
et al.'s expression 9 for T g /T{. We note that kq in (b) is 
well approximated by Eq. (13) of Ref. 23. Right col- 
umn: Experimental formulas for AH v n/AH ca \ vs. the 
population-based Ko used in our theoretical analyses. 

Fig. 3 Recent three-dimensional cubic lattice protein 
models considered in this paper for their conformities to 
the calorimetric two-state criterion. Monomers (residues) 
are numbered from one end of the chain to the other; 
monomer 1 corresponds to the leftmost letter of a se- 
quence. Each model protein chain is shown in its unique 
native or ground-state (lowest-enthalpy) structure. The 
corresponding sequence is also included, except for the 
Go model in (d), as the interactions of a Go model is 
determined solely by the ground-state conformation it 
presumes, (a) A 2-letter model of Socci and Onuchic (se- 
quence 002 in Table 1 of Ref. 44). (b) A 3-letter model of 
Socci et al. (sequence in Fig. 3 of Ref. 45). (c) A 20-letter 
model of Gutin et al. (sequence in Fig. 1 of Ref. 49). (d) 
A Go model of Pandc and Rokhsar (structure in Fig. 1 of 
Ref. 46). (e) A modified HP "solvation" model of Soren- 
son and Head- Gordon (sequence 6 in Table 1 of Ref. 47). 
Filled and open circles represent the H and P monomers, 
respectively, in this modified HP model, (f) A 20-letter 
sidechain model of Klimov and Thirumalai (sequence A 
in Fig. 1 of Ref. 48). Here the main-chain monomers are 
numbered, and sidechains are represented by grey circles. 



Fig. 4 Thermodynamic cooperativity of the 2-letter 
model in Fig. 3a. Results are obtained by the Monte 
Carlo (MC) histogram technique using simulation at 
T = 1.5. [N] and [D] are respectively the fractional 
native and denatured population, [N] + [D] = 1. In 
this figure and subsequent Figs. 5-9, the native state of 
each model is taken to be only its single ground-state 
(lowest H) conformation, and the denatured state con- 
sists of all other conformations. 23 The vertical lines give 
the midpoint temperatures. From left to right, they are 
T 1/2 when [N] = [D] = 1/2 (dashed line), T max , and T d 
(solid lines). In all six models studied here (Figs. 4-9), 
T\/2 < T max < Td- Upper panel: the specific heat capac- 
ity Cp is defined by Eq. (2) in the text; (Cj>)d is the spe- 
cific heat capacity of the denatured ensemble, obtained 
by replacing the Boltzmann averages (. . .) in Eq. (2) over 
the full ensemble by averages (. . .)d over the denatured 
(nonnative) ensemble. 23 Lower panel: The excess heat 
function (AH) (solid curve increasing with T) is given 
by Eq. (1) in the text, (AH)u (dashed curve) is the cor- 
responding average over the denatured ensemble, 23 both 
are normalized by (in units of) AH ca \ obtained by nu- 
merical integration of the entire area under the Cp curve, 
part of which is shown in the upper panel. Our results 
for Cp and (AH) are numerically consistent with the 
Cy and (E) functions in Figs. 10 and 9 of the original 
study. 72 

Fig. 5 Same as Fig. 4, but for the 3-letter model in 
Fig. 3b; obtained by the MC histogram technique from 
simulation at T = 1.5. 

Fig. 6 Same as Fig. 4, but for the 20-letter model in 
Fig. 3c; obtained by the MC histogram technique from 
simulation at T = 0.27. 

Fig. 7 Same as Fig. 4, but for the Go model in Fig. 3d; 
all continuous curves are obtained by the MC histogram 
technique from simulation at T = 0.75. For this model, 
Tmax (0.764) is almost equal to T d (0.767). Black dots in 
the lower panel are fractional native populations [N] at 
six different temperatures computed by direct MC sim- 
ulations, showing good agreement with results from the 
histogram method. 

Fig. 8 Same as Fig. 4, but for the modified HP "sol- 
vation" model in Fig. 3e; obtained by the MC histogram 
technique from simulation at T = 0.6. Our simulated 
Cp function (upper panel) is consistent with the original 
simulation (Cy of sequence 6 in Fig. 8 of Ref. 47). 

Fig. 9 Same as Fig. 4, but for the 20-letter sidechain 
model in Fig. 3f; obtained by the MC histogram tech- 
nique from simulation at T — 0.25. The Cp function in 
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the upper panel is consistent with the original heat ca- 
pacity simulation (Cy in Fig. 2c of Ref. 48). Our results 
are also consistent with the thermodynamics properties 
(x), A%, and Pnba given by Klimov and Thirumalai 48 
in their Fig. 2 (data not shown). 

Fig. 10 Distributions of denatured (nonnative) en- 
thalpy H of the 48mer Go model in Figs. 3d and 7 at 
different temperatures T, obtained by direct MC simu- 
lations (same temperatures as the black dots in Fig. 3). 
The native enthalpy is —57. The total area under a distri- 
bution curve is proportional to the fractional denatured 
population [D] at the given temperature. 

Fig. 11 Exploring effects of baseline subtractions on 
predicted calorimetric cooperativity. Ad hoc baseline 
subtractions are applied to the heat capacity functions 
of the 2-letter (a), Go (b), modified HP (c), and 20- 
letter sidechain (d) models. The model heat capacities 
(Cp's) are the same as those presented in Figures 4 and 
7-9. In each plot, the shaded area is subtracted from 
the original (pre-subtraction) Ai7 ca i to yield a new ef- 
fective calorimetric enthalpy Aif^j (< AH ca \). Native 
and denatured baselines with non-zero slopes are con- 
structed for (b) and (d). Denatured baselines with neg- 
ative slopes are provided for (a) and (c), but their na- 
tive baselines are assumed to have zero slope (i.e., no 
new native baseline) because the significant curvatures of 
their Cp functions at low temperatures do not appear to 
warrant linear positive-slope extrapolations. Solid ver- 
tical lines mark the temperature T max at the peak of 
heat capacity functions; the black dot marks the arith- 
metric mean of the values of native and denatured base- 
lines at T max . Following standard experimental calori- 
metric baseline procedures 50 ' 51 (see also Ref. 22), the 

(s) 

new effective heat capacity peak value Cp m is given by 
the vertical measure between the black dot and the pre- 
subtraction Cp max = Cp(T max ). The quantities Cp^ max 

(s) 

and AiJ^j are then used to compute the new effective 

van't Hoff to calorimetric enthalpy ratios k 2 in Ta- 
ble III. Included for comparison are nonlinear "formal 
two-state" baselines (dotted curves) constructed using 
the method of Zhou et al. 22 Nonlinear baselines corre- 
spond to heat capacity functions (Cp)o and (Cp)i of 
the native and denatured ensembles respectively No 
native nonlinear baseline is provided for (a) - (c) be- 
cause each of their native states is taken to have only a 
single conformation, as in the original analyses. 44 ' 46 ' 47 
Hence (C P ) = and (C P )i = (C P ) D for (a) - (c). 
On the other hand, for the 20-letter sidechain model in 
(d), the nonlinear native baseline is calculated 22 from a 
multiple-conformation native state defined by the origi- 
nal authors. 48 Vertical dashed lines mark the tempera- 
ture T m . For (a) - (c), T m = T 1/2 ; for (d), T m is the 



temperature at which one half of the chain population is 
in the multiple-conformation native state ("native basin 
of attraction" ) defined in Ref. 48. See the text for further 
details. 

Fig. 12 Thermodynamic/calorimetric cooperativity of 
a 3-letter model, (a) Same as Figure 11a, but for the 3- 
letter model of Socci et al. 45 in Figures 3b and 5. (b) 
Root-mean-square radius of gyration R g of this 3-letter 
chain model vs. temperature. (Square root of the Boltz- 
mann average of square radius of gyration of the chains.) 
R g continues to increase substantially as temperature is 
raised well above the transition region (vertical dashed 
and solid lines). 

Fig. 13 Thermodynamic/calorimetric cooperativity of 
a 20-letter model. Upper panel: Same as Figure lid, but 
for the 20-letter model of Gutin et al. 49 in Figures 3c and 
6. As in Figure lid, the vertical dashed line marks the 
temperature T m at which one half of the chain popula- 
tion is in the multiple-conformation native state defined 
by the original authors as the ensemble of conformations 
that have more than 20 contacts that also occur in the 
ground-state conformation (Q > 20, Q is referred to as 
the number of native contacts). 49 For this 36mer model, 
the total number Qn of native contacts equals 40. The 
corresponding native and denatured nonlinear baselines 
are calculated using the method of Zhou et al. 22 Lower 
panel: Folding/dcnaturation transition tracked by differ- 
ent order parameters. [N] is the fractional chain popu- 
lation in the single-conformation ground state; (Q) /Qn 
is the normalized Boltzmann-averaged number of native 
contacts; P(Q > 20) is the fractional population in the 
multiple-conformation native state; and (x) is the Boltz- 
mann average of the overlap function \ of Thirumalai 
and coworkers, which is a useful measure of the struc- 
tural similarity between any given conformation and the 
ground-state conformation. The single ground-state con- 
formation have Q/Qn = 1 and x = 0. The inset in the 
upper panel shows the relation between Q/Qn and x- 
While each Q is consistent with many values of x, and 
vice versa (scatter plot), for this model the correlation 
between their Boltzmann averages at different tempera- 
tures is almost perfect (curve in inset with slope ~ — 1). 

Fig. 14 Conformational diversity in the multiplc- 
lattice-conformation native state of the 20-letter model 
in Figures 3c, 6 and 13. In each conformation, the direc- 
tionality of the sequence is indicated by the filled circle, 
which marks the position of monomer 1 in Figure 3c. 
The three rows show example non-ground-state confor- 
mations {from top to bottom) with number of native con- 
tacts Q = 34, 35, and 36 respectively. These Q values 
are close to the average Q of the multiple-conformation 
native state at the midpoint temperatures T m and T max 



26 



(vertical dashed and solid lines in Figure 13). 49 

Fig. 15 Effects of the folding/denaturation transition 
on conformational properties of the 20-letter model in 
Figures 3c, 6 and 13. The dashed lines on the left mark 
T1/2 at which the fractional population [N] of the sin- 
gle ground-state conformation equals 1/2, the dashed 



lines on the right mark T m w T max (see Figure 13). 
Upper panel: Boltzmann-averaged number of nonnativc 
contacts (i.e., contacts that do not belong to the sin- 
gle ground-state conformation) vs. temperature. Lower 
panel: Root-mean-square radius of gyration vs. temper- 
ature. (Same as Figure 12b, but now for the 20-letter 
model.) 
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